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Abstract 



Randomized techniques play a fundamental role in theoretical computer 
science and discrete mathematics, in particular for the design of efficient al- 
gorithms and construction of combinatorial objects. The basic goal in deran- 
domization theory is to eliminate or reduce the need for randomness in such 
randomized constructions. Towards this goal, numerous fundamental notions 
have been developed to provide a unified framework for approaching various 
derandomization problems and to improve our general understanding of the 
power of randomness in computation. Two important classes of such tools are 
pseudorandom generators and randomness extractors. Pseudorandom genera- 
tors transform a short, purely random, sequence into a much longer sequence 
that looks random, while extractors transform a weak source of randomness 
into a perfectly random one (or one with much better qualities, in which case 
the transformation is called a randomness condenser). 

In this thesis, we explore some applications of the fundamental notions 
in derandomization theory to problems outside the core of theoretical com- 
puter science, and in particular, certain problems related to coding theory. 
First, we consider the wiretap channel problem which involves a communi- 
cation system in which an intruder can eavesdrop a limited portion of the 
transmissions. We utilize randomness extractors to construct efficient and 
information-theoretically optimal communication protocols for this model. 

Then we consider the combinatorial group testing problem. In this clas- 
sical problem, one aims to determine a set of defective items within a large 
population by asking a number of queries, where each query reveals whether 
a defective item is present within a specified group of items. We use ran- 
domness condensers to explicitly construct optimal, or nearly optimal, group 
testing schemes for a setting where the query outcomes can be highly unre- 
liable, as well as the threshold model where a query returns positive if the 
number of defectives pass a certain threshold. 

Next, we use randomness condensers and extractors to design ensembles 
of error-correcting codes that achieve the information-theoretic capacity of a 
large class of communication channels, and then use the obtained ensembles 
for construction of explicit capacity achieving codes. Finally, we consider 
the problem of explicit construction of error-correcting codes on the Gilbert- 
Varshamov bound and extend the original idea of Nisan and Wigderson to 
obtain a small ensemble of codes, mostly achieving the bound, under suitable 
computational hardness assumptions. 

Keywords: Derandomization theory, randomness extractors, pseudorandom- 
ness, wiretap channels, group testing, error- correcting codes. 



Resume 

Les techniques de randomisation jouent un role fondamental en informa- 
tique theorique et en mathematiques discretes, en particulier pour la concep- 
tion d'algorithmes efficaces et pour la construction d'objets combinatoires. 
L'objectif principal de la theorie de derandomisation est d'eliminer ou de 
reduire le besoin d'alea pour de telles constructions. Dans ce but, de nom- 
breuses notions fondamentales ont ete developpees, d'une part pour creer 
un cadre unifie pour aborder differents problemes de derandomisation, et 
d'autre part pour mieux comprendre l'apport de l'alea en informatique. Les 
generateurs pseudo-aleatoires et les extracteurs sont deux classes importantes 
de tels outils. Les generateurs pseudo-aleatoires transforment une suite courte 
et purement aleatoire en une suite beaucoup plus longue qui parait aleatoire. 
Les extracteurs d'alea transforment une source faiblement aleatoire en une 
source parfaitement aleatoire (ou en une source de meilleure qualite. Dans ce 
dernier cas, la transformation est appelee un condenseur d'alea). 

Dans cette these, nous explorons quelques applications des notions fon- 
damentales de la theorie de derandomisation a des problemes peripheriques 
a l'informatique theorique et en particulier a certains problemes relevant de 
la theorie des codes. Nous nous interessons d'abord au probleme du canal a 
jarretiere, qui consiste en un systeme de communication ou un intrus peut in- 
tercepter une portion limitee des transmissions. Nous utilisons des extracteurs 
pour construire pour ce modele des protocoles de communication efficaces et 
optimaux du point de vue de la theorie de l'information. 

Nous etudions ensuite le probleme du test en groupe combinatoire. Dans 
ce probleme classique, on se propose de determiner un ensemble d'objets 
defectueux parmi une large population, a travers un certain nombre de ques- 
tions, oil chaque reponse revele si un objet defectueux appartient a un certain 
ensemble d'objets. Nous utilisons des condenseurs pour construire explicite- 
ment des tests de groupe optimaux ou quasi-optimaux, dans un contexte oil 
les reponses aux questions peuvent etre tres peu fiables, et dans le modele de 
seuil oil le resultat d'une question est positif si le nombre d'objets defectueux 
depasse un certain seuil. 

Ensuite, nous utilisons des condenseurs et des extracteurs pour concevoir 
des ensembles de codes correcteurs d'erreurs qui atteignent la capacite (dans 
le sens de la theorie de l'information) d'un grand nombre de canaux de com- 
munications. Puis, nous utilisons les ensembles obtenus pour la construction 
de codes explicites qui atteignent la capacite. Nous nous interessons finale- 
ment au probleme de la construction explicite de codes correcteurs d'erreurs 
qui atteignent la borne de Gilbert-Varshamov et reprenons l'idee originale de 
Nisan et Wigderson pour obtenir un petit ensemble de codes dont la plupart 
atteignent la borne, sous certaines hypotheses de difficulty computationnelle. 

Mots-cles: Theorie de derandomisation, extracteurs d'alea, pseudo-alea, ca- 
naux a jarretiere, test en groupe, codes correcteurs d'erreurs. 
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Introduction 



Over the decades, the role of randomness in computation has proved to be one 
of the most intriguing subjects of study in computer science. Considered as a 
fundamental computational resource, randomness has been extensively used 
as an indispensable tool in design and analysis of algorithms, combinatorial 
constructions, cryptography, and computational complexity. 

As an illustrative example on the power of randomness in algorithms, con- 
sider a clustering problem, in which we wish to partition a collection of items 
into two groups. Suppose that some pairs of items are marked as inconsistent, 
meaning that they are best be avoided falling in the same group. Of course, 
it might be simply impossible to group the items in such a way that no in- 
consistencies occur within the two groups. For that reason, it makes sense 
to consider the objective of minimizing the number of inconsistencies induced 
by the chosen partitioning. Suppose that we are asked to color individual 
items red or blue, where the items marked by the same color form each of 
the two groups. How can we design a strategy that maximizes the number 
of inconsistent pairs that fall in different groups? The basic rule of thumb in 
randomized algorithm design suggests that 

When unsure making decisions, try flipping coins! 

Thus a naive strategy for assigning color to items would be to flip a fair coin 
for each item. If the coin falls Heads, we mark the item blue, and otherwise 
red. 

How can the above strategy possibly be any reasonable? After all we are 
defining the groups without giving the slightest thought on the given structure 
of the inconsistent pairs! Remarkably, a simple analysis can prove that the 
coin-flipping strategy is in fact a quite reasonable one. To see why, consider 
any inconsistent pair. The chance that the two items are assigned the same 
color is exactly one half. Thus, we expect that half of the inconsistent pairs 
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end up falling in different groups. By repeating the algorithm a few times and 
checking the outcomes, we can be sure that an assignment satisfying half of 
the inconsistency constraints is found after a few trials. 

We see that, a remarkably simple algorithm that does not even read its 
input can attain an approximate solution to the clustering problem in which 
the number of inconsistent pairs assigned to different groups is no less than 
half the maximum possible. However, our algorithm used a valuable resource; 
namely random coin flips, that greatly simplified its task. In this case, it is 
not hard to come up with an efficient (i.e., polynomial-time) algorithm that 
does equally well without using any randomness. However, designing such an 
algorithm and analyzing its performance is admittedly a substantially more 
difficult task that what we demonstrated within a few paragraphs above. 

As it turns out, finding an optimal solution to our clustering problem 
above is an intractable problem (in technical terms, it is NP-hard), and even 
obtaining an approximation ratio better than 16/17 ~ .941 is so |79| . Thus 
the trivial bit-flipping algorithm indeed obtains a reasonable solution. In a 
celebrated work, Goemans and Williamson |69| improve the approximation 
ratio to about .878, again using randomization 1 . A deterministic algorithm 
achieving the same quality was later discovered [104] , though it is much more 
complicated to analyze. 

Another interesting example demonstrating the power of randomness in 
algorithms is the primality testing problem, in which the goal is to decide 
whether a given n-digit integer is prime or composite. While efficient (poly- 
nomial-time in n) randomized algorithms were discovered for this problem as 



early as 1970's (e.g., Solovay-Strassen's [140 and Miller- Rabin's algorithms 



107 12l|), a deterministic polynomial-time algorithm for primality testing 



was found decades later, with the breakthrough work of Agrawal, Kayal, and 
Saxena [3], first published in 2002. Even though this algorithm provably works 
in polynomial time, randomized methods still tend to be more favorable and 
more efficient for practical applications. 

The primality testing algorithm of Agrawal et al. can be regarded as a de- 
randomization of a particular instance of the polynomial identity testing prob- 
lem. Polynomial identity testing generalizes the high-school-favorite problem 
of verifying whether a pair of polynomials expressed as closed form formulae 
expand to identical polynomials. For example, the following is an 8-variate 
identity 

I 2 , 2 , 2 , 2\/ 2 , 2 , 2 , 2\ _L 

(xiyi - x 2 y 2 - x 3 y 3 - x 4 y 4 ) 2 + (xiy 2 + x 2 y 1 + x 3 y± - x 4 y 3 ) 2 + 

(xiy 3 - x 2 y 4 + x 3 y\ + x 4 y 2 ) 2 + (xiy 4 + x 2 y 3 - x 3 y 2 + x 4 yi) 2 



1 Improving upon the approximation ration obtained by this algorithm turns out to be 



NP-hard under a well-known conjecture 90 



which turns out to be valid. When the number of variables and the complexity 
of the expressions grow, the task of verifying identities becomes much more 
challenging using naive methods. 

This is where the power of randomness comes into play again. A funda- 



mental idea due to Schwartz and Zippel [131 , 169 shows that the following 
approach indeed works: 

Evaluate the two polynomials at sufficiently many randomly cho- 
sen points, and identify them as identical if and only if all evalua- 
tions agree. 

It turns out that the above simple idea leads to a randomized efficient algo- 
rithm for testing identities that may err with an arbitrarily small probability. 
Despite substantial progress, to this date no polynomial-time deterministic 
algorithms for solving general identity testing problem is known, and a full 
derandomization of Schwartz-Zippel's algorithm remains a challenging open 
problem in theoretical computer science. 

The discussion above, among many other examples, makes the strange 
power of randomness evident. Namely, in certain circumstances the power of 
randomness makes algorithms more efficient, or simpler to design and analyze. 
Moreover, it is not yet clear how to perform certain computational tasks (e.g., 
testing for general polynomial identities) without using randomness. 

Apart from algorithms, randomness has been used as a fundamental tool 
in various other areas, a notable example being combinatorial constructions. 
Combinatorial objects are of fundamental significance for a vast range of the- 
oretical and practical problems. Often solving a practical problem (e.g., a 
real-world optimization problem) reduces to construction of suitable combi- 
natorial objects that capture the inherent structure of the problem. Examples 
of such combinatorial objects include graphs, set systems, codes, designs, ma- 
trices, or even sets of integers. For these constructions, one has a certain 
structural property of the combinatorial object in mind (e.g., mutual inter- 
sections of a set system consisting of subsets of a universe) and seeks for an 
instance of the object that optimizes the property in mind in the best possible 
way (e.g., the largest possible set system with bounded mutual intersections). 

The task of constructing suitable combinatorial objects turns out quite 
challenging at times. Remarkably, in numerous situations the power of ran- 
domness greatly simplifies the task of constructing the ideal object. A pow- 
erful technique in combinatorics, dubbed as the probabilistic method (see (Hj) 
is based on the following idea: 

When out of ideas finding the right combinatorial object, try a 
random one! 

Surprisingly, in many cases this seemingly naive strategy significantly beats 
the most brilliant constructions that do not use any randomness. An illumi- 
nating example is the problem of constructing Ramsey graphs. It is well known 
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that in a group of six or more people, either there are at least three people 
who know each other or three who do not know each other. More generally, 
Ramsey theory shows that for every positive integer K, there is an integer iV 
such that in a group of N or more people, either there are at least K peo- 
ple who mutually know each other (called a clique of size K) or K who are 
mutually unfamiliar with one another (called an independent set of size K). 
Ramsey graphs capture the reverse direction: 

For a given N, what is the smallest K such that there is a group 
of N people with no cliques or independent sets of size K or more? 
And how can an example of such a group be constructed? 

In graph-theoretic terms (where mutual acquaintances are captured by 
edges), an undirected graph with N := 2 n vertices is called a Ramsey graph 
with entropy k if it has no clique or independent set of size K := 2 (or 
larger). The Ramsey graph construction problem is to efficiently construct a 
graph with smallest possible entropy k. 

Constructing a Ramsey graph with entropy k = (n + l)/2 is already non- 
trivial. However, the following Hadamard graph does the job [35] : Each vertex 
of the graph is associated with a binary vector of length n, and there is an 
edge between two vertices if their corresponding vectors are orthogonal over 
the binary field. A much more involved construction, due to Barak et al. [9] 
(which remains the best deterministic construction to date) attain an entropy 

A brilliant, but quite simple, idea due to Erdos J57J demonstrates the power 
of randomness in combinatorial constructions: Construct the graph randomly, 
by deciding whether to put an edge between every pair of vertices by flipping 
a fair coin. It is easy to see that the resulting graph is, with overwhelming 
probability, a Ramsey graph with entropy k = logn + 2. It also turns out that 
this is about the lowest entropy one can hope for! Note the significant gap 
between what achieved by a simple, probabilistic construction versus what 
achieved by the best known deterministic constructions. 

Even though the examples discussed above clearly demonstrate the power 
of randomness in algorithm design and combinatorics, a few issues are inher- 
ently tied with the use of randomness as a computational resource, that may 
seem unfavorable: 

1. A randomized algorithm takes an abundance of fair, and independent, 
coin flips for granted, and the analysis may fall apart if this assumption 
is violated. For example, in the clustering example above, if the coin 
flips are biased or correlated, the .5 approximation ratio can no longer 
be guaranteed. This raises a fundamental question: 



Does "pure randomness" even exist? If so, how can we in- 
struct a computer program to produce purely random coin 
flips? 

2. Even though the error probability of randomized algorithms (such as the 
primality testing algorithms mentioned above) can be made arbitrarily 
small, it remains nonzero. In certain cases where a randomized algorithm 
never errs, its running time may vary depending on the random choices 
being made. We can never be completely sure whether an error-prone 
algorithm has really produced the right outcome, or whether one with 
a varying running time is going to terminate in a reasonable amount of 
time (even though we can be almost confident that it does). 

3. As we saw for Ramsey graphs, the probabilistic method is a powerful tool 
in showing that combinatorial objects with certain properties exist, and 
it most cases it additionally shows that a random object almost surely 
achieves the desired properties. Even though for certain applications a 
randomly produced object is good enough, in general there might be no 
easy way to certify whether a it indeed satisfies the properties sought 
for. For the example of Ramsey graphs, while almost every graph is a 
Ramsey graph with a logarithmically small entropy, it is not clear how 
to certify whether a given graph satisfies this property. This might be an 
issue for certain applications, when an object with guaranteed properties 
is needed. 

The basic goal of derandomization theory is to address the above-mentioned 
and similar issues in a systematic way. A central question in derandomiza- 
tion theory deals with efficient ways of simulating randomness, or relying on 
weak randomness when perfect randomness (i.e., a steady stream of fair and 
independent coin flips) is not available. A mathematical formulation of ran- 



domness is captured by the notion of entropy, introduced by Shannon 136] , 
that quantifies randomness as the amount of uncertainty in the outcome of 
a process. Various sources of "unpredictable" phenomena can be found in 
nature. This can be in form of an electric noise, thermal noise, ambient sound 
input, image captured by a video camera, or even a user's input given to an 
input device such as a keyboard. Even though it is conceivable to assume 
that a bit-sequence generated by all such sources contains a certain amount of 
entropy, the randomness being offered might be far from perfect. Randomness 
extractors are fundamental combinatorial, as well as computational, objects 
that aim to address this issue. 

As an example to illustrate the concept of extractors, suppose that we 
have obtained several independent bit-streams X\, Xi, ■ ■ ■ ,X r from various 
physically random sources. Being obtained from physical sources, not much is 
known about the structure of these sources, and the only assumption that we 
can be confident about is that they produce a substantial amount of entropy. 
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An extractor is a function that combines these sources into one, perfectly 
random, source. In symbols, we have 

f(X 1 ,X 2 ,...,X r ) = Y, 

where the output source Y is purely random provided that the input sources 
are reasonably (but not fully) random. To be of any practical use, the ex- 
tractor / must be efficiently computable as well. A more general class of 
functions, dubbed condensers are those that do not necessarily transform im- 
perfect randomness into perfect one, but nevertheless substantially purifies the 
randomness being given. For instance, as a condenser, the function / may be 
expected to produce an output sequence whose entropy is 90% of the optimal 
entropy offered by perfect randomness. 

Intuitively, there is a trade-off between structure and randomness. A se- 
quence of fair coin flips is extremely unpredictable in that one cannot bet on 
predicting the next coin flip and expect to gain any advantage out of it. On the 
other extreme, a sequence such as what given by digits of ir = 3.14159265 . . . 
may look random but is in fact perfectly structured. Indeed one can use 
a computer program to perfectly predict the outcomes of this sequence. A 
physical source, on the other hand, may have some inherent structure in it. In 
particular, the outcome of a physical process at a certain point might be more 
or less predictable, dictated by physical laws, from the outcomes observed 
immediately prior to that time. However, the degree of predictability may of 
course not be as high as in the case of n. 

From a combinatorial point of view, an extractor is a combinatorial object 
that neutralizes any kind of structure that is inherent in a random source, and, 
extracts the "random component" out (if there is any). On the other hand, 
in order to be any useful, an extractor must be computationally efficient. At 
a first sight, it may look somewhat surprising to learn that such objects may 
even exist! In fact, as in the case of Ramsey graphs, the probabilistic method 
can be used to show that a randomly chosen function is almost surely a decent 
extractor. However, a random function is obviously not good enough as an 
extractor since the whole purpose of an extractor is to eliminate the need 
for pure randomness. Thus for most applications, an extractor (and more 
generally, condenser) is required to be efficiently computable and utilize as 
small amount of auxiliary pure randomness as possible. 

While randomness extractors were originally studied for the main purpose 
of eliminating the need for pure randomness in randomized algorithms, they 
have found surprisingly diverse applications in different areas of combinatorics, 
computer science, and related fields. Among many such developments, one can 



mention construction of good expander graphs 161) and Ramsey graphs 9 
(in fact the best known construction of Ramsey graphs can be considered a 
byproduct of several developments in extractor theory) , communication com- 
plexity [35], Algebraic complexity theory |124|, distributed computing (e.g., 



73, 128, 171| ), data structures (e.g., 147] ), hardness of optimization problems 



111,170 , cryptography (see, e.g., 45 ), coding theory 149 1, signal processing 
86 , and various results in structural complexity theory (e.g., [7l]). 

In this thesis we extend such connections to several fundamental problems 

related to coding theory. In the following we present a brief summary of the 

individual problems that are studied in each chapter. 

The Wiretap Channel Problem 

The wiretap channel problem studies reliable transmission of messages over a 
communication channel which is partially observable by a wiretapper. As a 
basic example, suppose that we wish to transmit a sensitive document over 
the internet. Loosely speaking, the data is transmitted in form of packets, 
consisting of blocks of information, through the network. 

Packets may be transmitted along different paths over the network through 
a cloud of intermediate transmitters, called routers, until delivered at the 
destination. Now an adversary who has access to a set of the intermediate 
routers may be able to learn a substantial amount of information about the 
message being transmitted, and thereby render the communication system 
insecure. 

A natural solution for assuring secrecy in transmission is to use a standard 
cryptographic scheme to encrypt the information at the source. However, the 
information-theoretic limitation of the adversary in the above scenario (that is, 
the fact that not all of the intermediate routers, but only a limited number of 
them are being eavesdropped) makes it possible to provably guarantee secure 
transmission by using a suitable encoding at the source. In particular, in 
a wiretap scheme, the original data is encoded at the source to a slightly 
redundant sequence, that is then transmitted to the recipient. As it turns 
out, the scheme can be designed in such a way that no information is leaked 
to the intruder and moreover no secrets (e.g., an encryption key) need to be 
shared between the two parties prior to transmission. 

We study this problem in Chapter |3j The main contribution of this chap- 
ter is a construction of information-theoretically secure and optimal wiretap 
schemes that guarantee secrecy in various settings of the problem. In partic- 
ular the scheme can be applied to point-to-point communication models as 
well as networks, even in presence of noise or active intrusion (i.e., when the 
adversary not only eavesdrops, but also alters the information being trans- 
mitted) . The construction uses an explicit family of randomness extractors as 
the main building block. 

Combinatorial Group Testing 

Group testing is a classical combinatorial problem that has applications in 
surprisingly diverse and seemingly unrelated areas, from data structures to 
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coding theory to biology. 

Intuitively, the problem can be described as follows: Suppose that blood 
tests are taken from a large population (say hundreds of thousands of people), 
and it is suspected that a small number (e.g., up to one thousand) carry 
a disease that can be diagnosed using costly blood tests. The idea is that, 
instead of testing blood samples one by one, it might be possible to pool 
them in fairly large groups, and then apply the tests on the groups without 
affecting reliability of the tests. Once a group is tested negative, all the samples 
participating in the group must be negative and this may save a large number 
of tests. Otherwise, a positive test reveals that at least one of the individuals 
in the group must be positive (though we do not learn which). 

The main challenge in group testing is to design the pools in such a way to 
allow identification of the exact set of infected population using as few tests as 
possible, thereby economizing the identification process of the affected indi- 
viduals. In Chapter [4] we study the group testing problem and its variations. 
In particular, we consider a scenario where the tests can produce highly un- 
reliable outcomes, in which case the scheme must be designed in such a way 
that allows correction of errors caused by the presence of unreliable measure- 
ments. Moreover, we study a more general threshold variation of the problem 
in which a test returns positive if the number of positives participating in 
the test surpasses a certain threshold. This is a more reasonable model than 
the classical one, when the tests are not sufficiently sensitive and may be af- 
fected by dilution of the samples pooled together. In both models, we will use 
randomness condensers as combinatorial building blocks for construction of 
optimal, or nearly optimal, explicit measurement schemes that also tolerate 
erroneous outcomes. 

Capacity Achieving Codes 

The theory of error-correcting codes aims to guarantee reliable transmission 
of information over an unreliable communication medium, known in technical 
terms as a channel. In a classical model, messages are encoded into sequences 
of bits at their source, which are subsequently transmitted through the chan- 
nel. Each bit being transmitted through the channel may be flipped (from 
to 1 or vice versa) with a small probability. 

Using an error-correcting code, the encoded sequence can be designed in 
such a way to allow correct recovery of the message at the destination with an 
overwhelming probability (over the randomness of the channel). However, the 
cost incurred by such an encoding scheme is a loss in the transmission rate, 
that is, the ratio between the information content of the original message and 
the length of the encoded sequence (or in other words, the effective number 
of bits transmitted per channel use) . 

A capacity achieving code is an error correcting code that essentially max- 
imizes the transmission rate, while keeping the error probability negligible. 



The maximum possible rate depends on the channel being considered, and is 
a quantity given by the Shannon capacity of the channel. 

In Chapter ^1 we consider a general class of communication channels (in- 
cluding the above example) and show how randomness condensers and extrac- 
tors can be used to design capacity achieving ensembles of codes for them. 
We will then use the obtained ensembles to obtain explicit constructions of 
capacity achieving codes that allow efficient encoding and decoding as well. 

Codes on the Gilbert- Varshamov Bound 

While randomness extractors aim for eliminating the need for pure randomness 
in algorithms, a related class of objects known as pseudorandom generators 
aim for eliminating randomness altogether. This is made meaningful by a 
fundamental idea saying that randomness should be defined relative to the 
observer. The idea can be perhaps best described by an example due to 
Goldreich (70l Chapter 8], quoted below: 



"Alice and Bob play head or tail in one of the following four 
ways. In all of them Alice flips a coin high in the air, and Bob 
is asked to guess its outcome before the coin hits the floor. The 
alternative ways differ by the knowledge Bob has before making 
his guess. 

In the first alternative, Bob has to announce his guess before Alice 
flips the coin. Clearly, in this case Bob wins with probability 1/2. 

In the second alternative, Bob has to announce his guess while the 
coin is spinning in the air. Although the outcome is determined in 
principle by the motion of the coin, Bob does not have accurate 
information on the motion. Thus we believe that, also in this case 
Bob wins with probability 1/2. 

The third alternative is similar to the second, except that Bob 
has at his disposal sophisticated equipment capable of providing 
accurate information on the coin's motion as well as on the envi- 
ronment affecting the outcome. However, Bob cannot process this 
information in time to improve his guess. 

In the fourth alternative, Bob's recording equipment is directly 
connected to a powerful computer programmed to solve the motion 
equations and output a prediction. It is conceivable that in such 
a case Bob can improve substantially his guess of the outcome of 
the coin." 

Following the above description, in principle the outcome of a coin flip may 
well be deterministic. However, as long as the observer does not have enough 
resources to gain any advantage predicting the outcome, the coin flip should be 
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considered random for him. In this example, what makes the coin flip random 
for the observer is the inherent hardness (and not necessarily impossibility) 
of the prediction procedure. The theory of pseudorandom generators aim to 
express this line of thought in rigorous ways, and study the circumstances 
under which randomness can be simulated for a particular class of observers. 

The advent of probabilistic algorithms that are unparalleled by determinis- 
tic methods, such as randomized primality testing (before the AKS algorithm 
[3]), polynomial identity testing and the like initially made researchers believe 
that the class of problems solvable by randomized polynomial-time algorithms 
(in symbols, BPP) might be strictly larger than those solvable in polynomial- 
time without the need for randomness (namely, P) and conjecture P ^ BPP. 
To this date, the "P vs. BPP" problem remains one of the most challenging 
problems in theoretical computer science. 

Despite the initial belief, more recent research has led most theoreticians 
to believe otherwise, namely that P = BPP. This is supported by recent dis- 
covery of deterministic algorithms such as the AKS primality test, and more 
importantly, the advent of strong pseudorandom generators. In a seminal 



work 115 , Nisan and Wigderson showed that a "hard to compute" function 
can be used to efficiently transform a short sequence of random bits into a 
much longer sequence that looks indistinguishable from a purely random se- 
quence to any efficient algorithm. In short, they showed how to construct 
pseudorandomness from hardness. Though the underlying assumption (that 
certain hard functions exists) is not yet proved, it is intuitively reasonable to 
believe (just in the same way that, in the coin flipping game above, the hard- 
ness of gathering sufficient information for timely prediction of the outcome 
by Bob is reasonable to believe without proof). 

In Chapter^ we extend Nisan and Wigderson's method (originally aimed 
for probabilistic algorithms) to combinatorial constructions and show that, 
under reasonable hardness assumptions, a wide range of probabilistic combi- 
natorial constructions can be substantially derandomized. 

The specific combinatorial problem that the chapter is based on is the con- 
struction of error-correcting codes that attain the rate versus error-tolerance 
trade-off shown possible using the probabilistic method (namely, construction 
of codes on the so-called Gilbert- Varshamov bound). In particular, we demon- 
strate a small ensemble of efficiently constructible error-correcting codes al- 
most all of which being as good as random codes (under a reasonable assump- 
tion) . Even though the method is discussed for construction of error-correcting 
codes, it can be equally applied to numerous other probabilistic constructions; 
e.g., construction of optimal Ramsey graphs. 

Reading Guidelines 

The material presented in each of the technical chapters of this thesis (Chap- 
ters 3-6) are presented independently so they can be read in any order. Since 
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the theory of randomness extractors plays a central role in the technical con- 
tent of this thesis, Chapter [2] is devoted to an introduction to this theory, and 
covers some basic constructions of extractors and condenser that are used as 
building blocks in the main chapters. Since the extractor theory is already 
an extensively developed area, we will only touch upon basic topics that are 
necessary for understanding the thesis. 

Apart from extractors, we will extensively use fundamental notions of cod- 
ing theory throughout the thesis. For that matter, we have provided a brief 
review of such notions in Appendix |XJ 

The additional mathematical background required for each chapter is pro- 
vided when needed, to the extent of not losing focus. For a comprehensive 
study of the basic tools being used, we refer the reader to (5J 109 112] (probabil- 
ity, randomness in algorithms, and probabilistic constructions) , [82] (expander 
graphs), [8j[70] (modern complexity theory), [98 |103 127] (coding theory and 
basic algebra needed), (74J (list decoding), and [50,51 (combinatorial group 
testing). 

Each chapter of the thesis is concluded by the opening notes of a piece of 
music that I truly admire. 
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Johann Sebastian Bach (1685-1750): The Art of Fugue BWV 1080, 
Contrapunctus XIV. 



Chapter 2 

Extractor Theory 



"Art would be useless if the 
world were perfect, as man 
wouldn't look for harmony but 
would simply live in it. " 

— Andrei Tarkovsky 



Suppose that you are given a possibly biased coin that falls heads some p 
fraction of times (0 < p < 1) and are asked to use it to "simulate" fair 
coin flips. A natural approach to solve this problem would be to first try to 
"learn" the bias p by flipping the coin a large number of times and observing 
the fraction of times it falls heads during the experiment, and then using this 
knowledge to encode the sequence of biased flips to its information-theoretic 
entropy. 



Remarkably, back in 1951 John von Neumann |159 demonstrated a simple 



way to solve this problem without knowing the bias p: flip the coin twice and 
one of the following cases may occur: 

1. The first flip shows Heads and the second Tails: output "H". 

2. The first flip shows Tails and the second Heads: output "T". 

3. Otherwise, repeat the experiment. 

Note that the probability that the output symbol is "H" is precisely equal 
to it being "T", namely, p(l—p). Thus, the outcome of this process represents 
a perfectly fair coin toss. This procedure might be somewhat wasteful; for 
instance, it is expected to waste half of the coin flips even if p = 1/2 (that 
is, if the coin is already fair) and that is the cost we pay for not knowing 
p. But nevertheless, it transforms an imperfect, not fully known, source of 
randomness into a perfect source of random bits. 

This example, while simple, demonstrates the basic idea in what is known 
as "extractor theory". The basic goal in extractor theory is to improve ran- 
domness, that is, to efficiently transform a "weak" source of randomness into 
one with better qualities; in particular, having a higher entropy per symbol. 
The procedure shown above, seen as a function from the sequence of coin flips 
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to a Boolean function (over {H,T}) is known as an extractor. It is called so 
since it "extracts" pure randomness from a weak source. 

When the distribution of the weak source is known, it is possible to use 
techniques from source coding (say Huffman or Arithmetic Coding) to com- 
press the information to a number of bits very close to its actual entropy, 
without losing any of the source information. What makes extractor theory 
particularly challenging is the following issues: 

1. An extractor knows little about the exact source distribution. Typically 
nothing more than a lower bound on the source entropy, and no structure 
is assumed on the source. In the above example, even though the source 
distribution was unknown, it was known to be an i.i.d. sequence (i.e., 
a sequence of independent, identically distributed symbols). This need 
not be the case in general. 

2. The output of the extractor must "strongly" resemble a uniform distri- 
bution (which is the distribution with maximum possible entropy), in 
the sense that no statistical test (no matter how complex) should be 
able to distinguish between the output distribution and a purely ran- 
dom sequence. Note, for example, that a sequence of n — 1 uniform and 
independent bits followed by the symbol "0" has n — 1 bits of entropy, 
which is only slightly lower than that of n purely random bits (i.e., n). 
However, a simple statistical test can trivially distinguish between the 
two distributions by only looking at the last bit. 

Since extractors and related objects (in particular, lossless condensers) 
play a central role in the technical core of this thesis, we devote this chapter 
to a formal treatment of extractor theory, introducing the basic ideas and 
some fundamental constructions. In this chapter, we will only cover basic 
notions and discuss a few of the results that will be used as building blocks in 
the rest of thesis. 

2.1 Probability Distributions 

2.1.1 Distributions and Distance 

In this thesis we will focus on probability distributions over finite domains. 
Let (£), E, X ) be a probability space, where f2 is a finite sample space, E is the 
set of events (that in our work, will always consist of the set of subsets of Q), 
and X is a probability measure. The probability assigned to each outcome 
x £ Q by X will be denoted by X(x), or Pr^(x). Similarly, for an event T £ E, 
we will denote the probability assigned to T by X(T), or Pr^[T] (when clear 
from the context, we may omit the subscript X). The support of X is defined 
as 

supp(^) := {x e n-. X(x) > 0}. 
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A particularly important probability measure is denned by the uniform dis- 
tribution, which assigns equal probabilities to each element of £1. We will 
denote the uniform distribution over il by Un, and use the shorthand U n , for 
an interger n > 1, for UiQnn. We will use the notation X ~ X to denote that 
the random variable X is drawn from the probability distribution X. 

It is often convenient to think about the probability measure as a real 
vector of dimension |fi|, whose entries are indexed by the elements of Q, such 
that the value at the zth entry of the vector is X {%). 

An important notion for our work is the distance between distributions. 
There are several notions of distance in the literature, some stronger than 
the others, and often the most suitable choice depends on the particular ap- 
plication in hand. For our applications, the most important notion is the £ p 
distance: 

Definition 2.1. Let X and y be probability distributions on a finite domain 
Q. Then for every p > 1, their £ p distance, denoted by \\X — y\\ p , is given by 



\xefi 

We extend the distribution to the special case p = oo, to denote the point-wise 

distance: 

\\X-y\\ 00 :=m a x\X(x)-y(y)\. 
x£Q 

The distributions X and y are called e-close with respect to the £ p norm if 
and only if \\X — y\\ p < e. 

We remark that, by the Cauchy-Schwarz inequality, the following relation- 
ship between £\ and £2 distances holds: 




\\x - y\\ 2 < \\x - y\\! < y/\n\ ■ \\x - y\\ 2 . 

Of particular importance is the statistical (or total variation) distance. 
This is defined as half the £\ distance between the distributions: 

ll*-3>l|:= Hi*-;y||i. 

We may also use the notation dist(A', y) to denote the statistical distance. We 
call two distributions e-close if and only if their statistical distance is at most 
e. When there is no risk of confusion, we may extend such notions as distance 
to the random variables they are sampled from, and, for instance, talk about 
two random variables being e-close. 

This is in a sense, a very strong notion of distance since, as the following 
proposition suggests, it captures the worst-case difference between the proba- 
bility assigned by the two distributions to any event: 
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Proposition 2.2. Let X and y be distributions on a finite domain Vt. Then X 
and y are e-close if and only if for every event T C J7, | Pr^[T] — Pry[T]| < e. 

Proof. Denote by Q, x and £ly the following partition of fi: 

n x :={xen-. x(x)>y(x)}, n y -.= n\T x . 

Thus, \\X - y\\ = 2(Pr x (n x ) - Pry(fi*)) = 2(Pi y (n y ) - Pi x (n y )). Let 
pi :=Pr x [Tr\Q, x ]-Piy[Tr\Q x ],widp 2 := PrypTnfiy] -Pr x [TnQ,y]. Both 
pi and pi are positive numbers, each no more than e. Therefore, 

\Pr[T}-Pr[T]\ = \p 1 -p 2 \<e. 
x y 

For the reverse direction, suppose that for every event T C J7, | Pr^[T] — 
Pry [71 1 < e. Then, 



\X - y\\! = | Pr[^] - Pr[n x ]| + | Pr[fiy] - Pr[Oy]| < 2e. 

^-t J/ ^t J/ 



□ 



An equivalent way of looking at an event T C Q is by defining a predicate 
P: Q — > {0,1} whose set of accepting inputs is T; namely, P(x) = 1 if and 



only if x E T. In this view, Proposition 2.2 can be written in the following 
equivalent form. 

Proposition 2.3. Let X and y be distributions on the same finite domain £1. 
Then X and y are e-close if and only if, for every distinguisher P: Q — >■ {0, 1}, 
we have 



Pr \P(X) = 11 - Pr \P(Y) = 11 



<e. 

□ 

The notion of convex combination of distributions is defined as follows: 

Definition 2.4. Let X\, X 2 , ■ ■ ■ , X n be probability distributions over a finite 
space and ai, a 2 , ■ ■ ■ ,a n be nonnegative real values that sum up to 1. Then 
the convex combination 

aiXi + a 2 X 2 -\ h a n X n 

is a distribution X over £1 given by the probability measure 

n 

Pr(x) :=^QiPr(x), 

i=i 

for every x £ £1. 
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When regarding probability distributions as vectors of probabilities (with 
coordinates indexed by the elements of the sample space), convex combina- 
tion of distributions is merely a linear combination (specifically, a point- wise 
average) of their vector forms. Thus intuitively, one expects that if a proba- 
bility distribution is close to a collection of distributions, it must be close to 
any convex combination of them as well. This is made more precise in the 
following proposition. 

Proposition 2.5. Let Xi,X 2 , ■ ■ ■ , X n be probability distributions, all defined 
over the same Unite set f2, that are all e-close to some distribution y. Then 
any convex combination 

X := ol\X\ + a 2 X 2 H h a n X n 

is e-close to y. 

Proof. We give a proof for the case n = 2, which generalizes to any larger 
number of distributions by induction. Let T C $7 be any nonempty subset of 
f2. Then we have 

|Pr[T]-Pr[T]| = |aiPr[T] + (l-c*i)Pr[Tl-Pr[T]| 

x y Xi x 2 y 

= |ai(Pr[T] + ei) + (1 - ai)(Pr[T] + e 2 ) - Pr[T]|, 

where |ei|, | e2 1 < e by the assumption that X\ and X 2 are e-close to y. Hence 
the distance simplifies to 

|aiei + (1 - ai)e 2 |, 

and this is at most e. □ 

In a similar manner, it is straightforward to see that a convex combination 
(1 — e)X + ey is e-close to X. 

Sometimes, in order to show a claim for a probability distribution it may be 
easier, and yet sufficient, to write the distribution as a convex combination of 
"simpler" distributions and then prove the claim for the simpler components. 
We will examples of this technique when we analyze constructions of extractors 
and condensers. 

2.1.2 Entropy 

A central notion in the study of randomness is related to the information 
content of a probability distribution. Shannon formalized this notion in the 
following form: 
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Definition 2.6. Let X be a distribution on a finite domain f2. The Shannon 
entropy of A' (in bits) is defined as 

H(X) := Y, ~ X{ - X ) lo §2 X(x) = E x ^ x [- log 2 X{X)\. 

sSsupp(<V) 

Intuitively, Shannon entropy quantifies the number of bits required to spec- 
ify a sample drawn from X on average. This intuition is made more precise, for 
example by Huffman coding that suggest an efficient algorithm for encoding a 
random variable to a binary sequence whose expected length is almost equal 
to the Shannon entropy of the random variable's distribution (cf. [40])- For 
numerous applications in computer science and cryptography, however, the 
notion of Shannon entropy-which is an average-case notion-is not well suit- 
able and a worst-case notion of entropy is required. Such a notion is captured 
by min-entropy, defined below. 

Definition 2.7. Let X be a distribution on a finite domain £1. The min- 
entropy of X (in bits) is defined as 

Hoo(X) := min — \og 2 X{x). 

x£supp(A') 

Therefore, the min-entropy of a distribution is at least k if and only if the 
distribution assigns a probability of at most 2 to any point of the sample 
space (such a distribution is called a fc-source). It also immediately follows by 
definitions that a distribution having min-entropy at least k must also have a 
Shannon entropy of at least k. When fi = {0, l} n , we define the entropy rate 
of a distribution X on f2 as H OQ (X)/n. 

A particular class of probability distributions for which the notions of 
Shannon entropy an min-entropy coincide is flat distributions. A distribution 
on 0, is called flat if it is uniformly supported on a set T C O; that is, if 
it assigns probability 1/|T| to all the points on T and zeros elsewhere. The 
Shannon- and min-entropies of such a distribution are both log 2 \T\ bits. 

An interesting feature of flat distributions is that their convex combina- 
tions can define any arbitrary probability distribution with a nice preservence 
of the min-entropy, as shown below. 

Proposition 2.8. Let K be an integer. Then any distribution X with min- 
entropy at least logK can be described as a convex combination of flat distri- 
butions with min-entropy \ogK. 

Proof. Suppose that X is distributed on a finite domain Q. Any probability 
distribution on Q can be regarded as a real vector with coordinates indexed by 
the elements of $7, encoding its probability measure. The set of distributions 
(Pi)ien with min-entropy at least log K form a simplex 

(Vi G n) < pi < 1/K, 
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whose corner points are flat distributions. The claim follows since every point 
in the simplex can be written as a convex combination of the corner points. □ 

2.2 Extractors and Condensers 

2.2.1 Definitions 

Intuitively, an extractor is a function that transforms impure randomness; i.e., 
a random source containing a sufficient amount of entropy, to an almost uni- 
form distribution (with respect to a suitable distance measure; e.g., statistical 
distance). 

Suppose that a source X is distributed on a sample space il := {0, l} ra 
with a distribution containing at least k bits of min-entropy. The goal is 
to construct a function /: {0, l} n — > {0, l} m such that f(X) is e-close to 
the uniform distribution U m , for a negligible distance e (e.g., e = 2~^ n >). 
Unfortunately, without having any further knowledge on X, this task becomes 
impossible. To see why, consider the simplest nontrivial case where k = n — 1 
and m = 1, and suppose that we have come up with a function / that extracts 
one almost unbiased coin flip from any fc-source. Observe that among the set 
of pre-images of and 1 under /; namely, / _1 (0) and / _1 (1), at least one must 
have size 2™ _1 or more. Let X be the flat source uniformly distributed on this 
set. The distribution X constructed this way has min-entropy at least n — 1 
yet f{X) is always constant. In order to alleviate this obvious impossibility, 
one of the following two solutions is typically considered: 

1. Assume some additional structure on the source: In the counterexample 
above, we constructed an opportunistic choice of the source X from the 
function /. However, in general the source obtained this way may turn 
out to be exceedingly complex and unstructured, and the fact that / 
is unable to extract any randomness from this particular choice of the 
source might be of little concern. A suitable way to model this obser- 
vation is to require a function / that is expected to extract randomness 
only from a restricted class of randomness sources. 

The appropriate restriction in question may depend on the context for 
which the extractor is being used. A few examples that have been con- 
sidered in the literature include: 

• Independent sources: In this case, the source X is restricted to be a 
product distribution with two or more components. In particular, 
one may assume the source to be the product distribution of r > 2 
independent random variables X±, . . . , X r G {0, l} n that are each 
sampled from an arbitrary fc'-source (assuming n = rn' and k = 
rk'). 
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• Affine sources: We assume that the source X is uniformly supported 
on an arbitrary translation of an unknown £;-dimensional vector 
subspace of 1 Fj . A further restriction of this class is known as bit- 
fixing sources. A bit-fixing source is a product distribution of n bits 
(Xi, . . . , X n ) where for some unknown set of coordinates positions 
S C [n] of size k, the variables Xi for i e S are independent and 
uniform bits, but the rest of the AVs are fixed to unknown binary 
values. In Chapter [3j we will discuss these classes of sources in 
more detail. 

• Samplable sources: This is a class of sources first studied by Tre- 
visan and Vadhan [153| . In broad terms, a samplable source is a 
source X such that a sample from X can produced out of a sequence 
of random and independent coin flips by a restricted computational 
model. For example, one may consider the class of sources of min- 
entropy k such that for any source X in the class, there is a func- 
tion /: {0, l} r — > {0, 1}™, for some r > k, that is computable by 
polynomial-size Boolean circuits and satisfies f(U r ) ~ X. 

For restricted classes of sources such as the above examples, there are 
deterministic functions that are good extractors for all the sources in the 
family. Such deterministic functions are known as seedless extractors for 
the corresponding family of sources. For instance, an affine extractor for 
entropy k and error e (in symbols, an affine (k, e)-extractor) is a mapping 
/: FJ? —> F™ such that for every affine fc-source X, the distribution f{X) 
is e-close to the uniform distribution U m . 

In fact, it is not hard to see that for any family of not "too many" sources, 
there is a function that extracts almost the entire source entropy of the 
sources (examples include affine fe-sources, samplable /c-sources, and two 
independent sources 2 ). This can be shown by a probabilistic argument 
that considers a random function and shows that it achieves the desired 
properties with overwhelming probability. 

2. Allow a short random seed: The second solution is to allow extractor 
to use a small amount of pure randomnness as a "catalyst" . Namely, 
the extractor is allowed to require two inputs: a sample from the un- 
known source and a short sequence of random and independent bits that 
is called the seed. In this case, it turns out that extracting almost the 
entire entropy of the weak source becomes possible, without any struc- 
tural assumptions on the source and using a very short independent seed. 
Extractors that require an auxiliary random input are called seeded ex- 
tractors. In fact, an equivalent of looking at seeded extractors is to see 



throughout the thesis, for a prime power q, we will use the notation F q to denote the 
finite field with q elements. 

2 For this case, it suffices to count the number of independent flat sources. 
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them as seedless extractors that assume the source to be structured as 
a product distribution of two sources: an arbitrary fe-source and the 
uniform distribution. 

For the rest of this chapter, we will focus on seeded extractors. Seedless 
extractors (especially affine extractors) are treated in Chapter pi A formal 
definition of (seeded) extractors is as follows. 

Definition 2.9. A function / : {0, l} n x {0, l} d -> {0, l} m is a (k, e)-extractor 
if, for every /c-source X on {0, l} n , the distribution f(X,Ud) is e-close (in 
statistical distance) to the uniform distribution on {0, l} m . The parameters 
n, d, k, m, and e are respectively called the input length, seed length, entropy 
requirement, output length, and error of the extractor. 

An important aspect of randomness extractors is their computational com- 
plexity. For most applications, extractors are required to be efficiently com- 
putable functions. We call an extractor explicit if it is computable in polyno- 
mial time (in its input length). Though it is rather straightforward to show 
existence of good extractors using probabilistic arguments, coming up with a 
nontrivial explicit construction can turn out a much more challenging task. 
We will discuss and analyze several important explicit constructions of seeded 
extractors in Section l2~3l 

Note that, in the above definition of extractors, achieving an output length 
of up to d is trivial: the extractor can merely output its seed, which is guaran- 
teed to have a uniform distribution! Ideally the output of an extractor must 
be "almost independent" of its seed, so that the extra randomness given in 
the seed can be "recycled" . This idea is made precise in the notion of strong 
extractors given below. 

Definition 2.10. A function /: {0, l} n x {0, l} d -)■ {0, l} m is a strong (k,e)- 
extractor if, for every A;-source X on {0, l} n , and random variables X ~ X, 
Z ~ lid, the distribution of the random variable (X,f(X,Z)) is e-close (in 
statistical distance) to Ud+ m - 

A fundamental property of strong extractors that is essential for certain 
applications is that, the extractor's output remains close to uniform for almost 
all fixings of the random seed. This is made clear by an "averaging argument" 
stated formally in the proposition below. 

Proposition 2.11. Consider joint distributions X := (Z,X) andy := (Z,y) 
that are e-close, where X and y are distributions on a Unite domain Q, and 
Z is uniformly distributed on {0, l} d . For every z E {0, l} d , denote by X z the 
distribution of the second coordinate of X conditioned on the first coordinate 
being equal to z, and similarly define y z for the distribution y. Then, for 
every S > 0, at least (1 — 5)2 d choices of z G {0, l} d must satisfy 

\\X z -y z \\ <e/S. 
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Proof. Clearly, for every to G Q and z G {0, l} d , we have X z (u) = 2 d X(z,co) 
and similarly, y z (w) = 2 y(z,ui). Moreover from the definition of statistical 
distance, 

J2 J^\X(z,u)-y(z,u)\<2e. 
ze{o,i} d ^60 

Therefore, 

ze{0,l} d ^6f2 

which can be true only if for at least (1 — 5) fraction of the choices of z, we 
have 

Y J \Xz(u)-y z (oo)\<2e/5, 
wee 

or in other words, 

\\X z -y z \\ <e/5. 

This shows the claim. □ 



Thus, according to Proposition 2.11, for a strong (k, e)-extractor 

/:{0,irx{0,l} d ^{0,l} m 

and a fc-source X , for 1 — -^/e fraction of the choices of z G {0, 1} , the distri- 
bution f(X , z) must be e-close to uniform. 

Extractors are specializations of the more general notion of randomness 
condensers. Intuitively, a condenser transforms a given weak source of ran- 
domness into a "more purified" but possibly imperfect source. In general, the 
output entropy of a condenser might be substantially less than the input en- 
tropy but nevertheless, the output is generally required to have a substantially 
higher entropy rate. For the extremal case of extractors, the output entropy 
rate is required to be 1 (since the output is required to be an almost uni- 
form distribution). Same as extractors, condensers can be seeded or seedless, 
and also seeded condensers can be required to be strong (similar to strong 
extractors). Below we define the general notion of strong, seeded condensers. 

Definition 2.12. A function / : {0, l} n x {0, l} d ->• {0, l} m is a strong k ^ e k' 
condenser if for every distribution X on {0, l} n with min-entropy at least k, 
random variable X ~ X and a seed Y ~ lA^ the distribution of (Y, f(X, Y)) is 
e-close to a distribution (Ud, Z) with min-entropy at least d + k' . The param- 
eters k, k' , e, k — k! , and m — k' are called the input entropy, output entropy, 
error, the entropy loss and the overhead of the condenser, respectively. A 
condenser is explicit if it is polynomial-time computable. 

Similar to strong extractors, strong condensers remain effective under al- 



most all fixings of the seed. This follows immediately from Proposition 2.11 
and is made explicit by the following corollary: 
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Corollary 2.13. Let f : {0, l} n x {0, l} d -> {0, l} m be a strong k -> e k' 
condenser. Consider an arbitrary parameter S > and a k-source X. Then, 
for all but at most a 5 fraction of the choices of z € {0, l} d , the distribution 
f(X,z) is (e/5)-close to a k! '-source. □ 

Typically, a condenser is only interesting if the output entropy rate k' /m 
is considerably larger than the input entropy rate k/n. From the above defi- 
nition, an extractor is a condenser with zero overhead. Another extremal case 
corresponds to the case where the entropy loss of the condenser is zero. Such a 
condenser is called lossless. We will use the abbreviated term (k, e)-condenser 
for a lossless condenser with input entropy k (equal to the output entropy) 
and error e. Moreover, if a function is a (ko, e)-condenser for every k$ < k, it is 
called a (< k, e)-condenser. Most known constructions of lossless condensers 
(and in particular, all constructions used in this thesis) are (< k, e)-condensers 
for their entropy requirement k. 

Traditionally, lossless condensers have been used as intermediate building 
blocks for construction of extractors. Having a good lossless condenser avail- 
able, for construction of extractors it would suffice to focus on the case where 
the input entropy is large. Nevertheless, lossless condensers have been proved 
to be useful for a variety of applications, some of which we will discuss in this 
thesis. 

2.2.2 Almost-Injectivity of Lossless Condensers 

Intuitively, an extractor is an almost "uniformly surjective" mapping. That 
is, the extractor mapping distributes the probability mass of the input source 
almost evenly among the elements of its range. 

On the other hand, a lossless condenser preserves the entire source entropy 
on its output and intuitively, must be an almost injective function when re- 
stricted to the domain defined by the input distribution. In other words, in the 
mapping defined by the condenser "collisions" rarely occur and in this view, 
lossless condensers are useful "hashing" tools. In this section we formalize this 
intuition through a simple practical application. 

Given a source X and a function /, if f(X) has the same entropy as that 
of X (or in other words, if / is a perfectly lossless condenser for X) we expect 
that from the outcome of the function, its input when sampled from X must 
be reconstructible. For flat distributions (that is, those that are uniform on 
their support) and considering an error for the condenser, this is shown in the 
following proposition. We will use this simple fact several times throughout 
the thesis. 

Proposition 2.14. Let X be a flat distribution with min-entropy log K over 
a finite sample space £1 and f : £1 — > T be a mapping to a finite set T. 
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1. If f(X) is e-close to having min-entropy log K, then there is a set TCT 
of size at least (1 — 2e)K such that 

(Vy £ T and Vx, x' € supp(Af)) /(x) = y A /(x ; ) = y =^> x = x' . 

2. Suppose |T| > K. If f(X) has a support of size at least (1 — e)K, then 
it is e-close to having min-entropy log K. 

Proof. Suppose that X is uniformly supported on a set S C il of size iT, and 
denote by fi the distribution /(A?) over T. For each y £ T, define 

% := |{x G supp(Af) : /(x) = y}|. 

Moreover, define T := {y G T: n^ = 1}, and similarly, T' := {y £T: n y > 2}. 
Observe that for each y £ T we have /i(y) = rii/K, and also supp(^) = TUT'. 
Thus, 

(2.1) |T| + J2 n y = K. 

yGT' 

Now we show the first assertion. Denote by // a distribution on V with min- 
entropy K that is e-close to /i, which is guaranteed to exist by the assumption. 
The fact that [i and /u' are e-close implies that 

E lM(y) - A*'(y)l <^D%-^ ei ^- 

In particular, this means that \T'\ < eK (since by the choice of T' , for each 
y G T' we have n^ > 2). Furthermore, 

^ (n y - 1) < eK =► ^n y <eK + \T'\ < 2eK. 

y£T> y&T' 



This combined with (2.1) gives 



\ T \ =K- Y^ n y > (l-2e)K 

yeT' 

as desired. 

For the second part, observe that \T'\ < eK. Let // be any flat distribution 
with a support of size K that contains the support of /i. The statistical 
distance between [i and \J is equal to the difference between the probability 
mass of the two distributions on those elements of T to which // assigns a 
bigger probability, namely, 

1, . a , » E^T'K- 1 ) E. ye T'^-! T 'l K-|T|-|T'| 

:(supp(/z)-SUpp(jU)) - ' " 



iP rrvr ^ rry ^" K K K 



where we have used ( |2.1[ ) for the last equality. But \T\ + |T'| = |supp(yu)| > 
(1 — e)K, giving the required bound. □ 
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As a simple application of this fact, consider the following "source coding" 
problem. Suppose that Alice wants to send a message x to Bob through a 
noiseless communication channel, and that the message is randomly sampled 
from a distribution X. Shannon's source coding theorem roughly states that, 
there is a compression scheme that encodes x to a binary sequence y of length 
H{X) bits on average, where H (■) denotes the Shannon entropy, such that Bob 
can perfectly reconstruct y from x (cf. [40| Chapter 5]). If the distribution X 
is known to both Alice and Bob, they can use an efficient coding scheme such 
as Huffman codes or Arithmetic coding to achieve this bound (up to a small 
constant bits of redundancy) . 

On the other hand, certain universal compression schemes are known that 
guarantee an optimal compression provided that X satisfies certain statisti- 
cal properties. For instance, Lempel-Ziv coding achieves the optimum com- 
pression rate without exact knowledge of X provided that is defined by a 
stationary, ergodic process (cf. [40j Chapter 13]). 

Now consider a situation where the distribution X is arbitrary but only 
known to the receiver Bob. In this case, it is known that there is no way 
for Alice to substantially compress her information without interaction with 
Bob |2|. On the other hand, if we allow interaction, Bob may simply send a 
description of the probability distribution X to Alice so she can use a classical 
source coding scheme to compress her information at the entropy. 

Interestingly, it turns out that this task is still possible if the amount of in- 
formation sent to Alice is substantially lower than what needed to fully encode 
the probability distribution X. This is particularly useful if the bandwidth 
from Alice to Bob is substantially lower than that of the reverse direction 
(consider, for example, an ADSL connection) and for this reason, the problem 
is dubbed as the asymmetric communication channel problem. In particular, 
Watkinson et al. |160| obtain a universal scheme with H(X) + 2 bits of com- 
munication from Alice to Bob and n(H(X) + 2) bits from Bob to Alice, where 
n is the bit-length of the message. Moreover, Adler et al. [I] obtain strong 
lower bounds on the number of rounds of communication between Alice and 
Bob. 

Now let us impose a further restriction on X that it is uniformly sup- 
ported on a set S C {0, 1}™, and Alice knows nothing about S but its size. 
If we disallow interaction between Alice and Bob, there would still be no de- 
terministic way for Alice to deterministically compress her message. This is 
easy to observe by noting that any deterministic, and compressing, function 
(p: {0,1}™ — > {0, l} m , where m < n, has an output value with as many as 
2 n ~ m pre-images, and an adversarial choice of S that concentrates on the set 
of such pre-images would force the compression scheme to fail. 

However, let us allow the encoding scheme to be randomized, and err with 
a small probability over the randomness of the scheme and the message. In this 
case, Alice can take a strong lossless condenser /: {0, l} n x {0, l} d — > {0, l} m 
for input entropy k := log|S|, choose a uniformly random seed z G {0, l} d , 
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and transmit y := (z, f(x, z)) to Bob. Now we argue that Bob will be able to 
recover x from y. 

Let e denote the error of the condenser. Since / is a lossless condenser for 
X, we know that, for Z ~ Ud and X ~ X, the distribution of (Z, f(X, Z)) is 
e-close to some distribution (Ud,y), with min-entropy at least d + k. Thus 



by Corollary 2.13 it follows that, for at least 1 — \fe fraction of the choices 



of z G {0,1}, the distribution y z := f(X,z) is i/e-close to having min- 



entropy k. For any such "good seed" z, Proposition 2.14 implies that only 
for at most 2-^/e fraction of the message realizations x S S can the encoding 
f(x, z) be confused with a different encoding f(x', z) for some x' £ S, i' / x. 
Altogether we conclude that, from the encoding y, Bob can uniquely deduce 
x with probability at least 1 — 3-^/e, where the probability is taken over the 
randomness of the seed and the message distribution X . 

The amount of communication in this encoding scheme is m+d bits. Using 
an optimal lossless condenser for /, the encoding length becomes k + O(logn) 
with a polynomially small (in n) error probability (where the exponent of 
the polynomial is arbitrary and affects the constant in the logarithmic term). 
On the other hand, with the same error probability, the explict condenser of 
Theorem 4.19 would give an encoding length k + 0(log n). Moreover, the 



explicit condenser of Theorem 2.22 results in length fc(l + a) + O a (logn) for 
any arbitrary constant a > 0. 

2.3 Constructions 

We now turn to explicit constructions of strong extractors and lossless con- 
densers. 

Using probabilistic arguments, Radhakrishan and Ta-Shma |122| showed 
that, for every k,n,e, there is a strong (k, e)-extractor with seed length d = 
log(n - k) + 21og(l/e) + O(l) and output length m = k - 21og(l/e) - 0(1). 
In particular, a random function achieves these parameters with probability 
1 — o(l). Moreover, their result show that this trade-off is almost the best one 
can hope for. 

Similar trade-offs are known for lossless condensers as well. Specifically, the 
probabilistic construction of Radhakrishan and Ta-Shma has been extended 
to the case of lossless condensers by Capalbo et al. J23J, where they show that 
a random function is with high probability a strong lossless (k, e)-condenser 
with seed length d = logn + log(l/e) + O(l) and output length m = k + 
log(l/e) + 0(1). Moreover, this tradeoff is almost optimal as well. 

In this section, we introduce some important explicit constructions of both 
extractors and lossless condensers that are used as building blocks of various 
constructions in the thesis. In particular, we will discuss extractors and lossless 
condensers obtained by the Leftover Hash Lemma, Trevisan's extractor, and 
a lossless condenser due to Guruswami, Umans, and Vadhan. 
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2.3.1 The Leftover Hash Lemma 

One of the foremost explicit constructions of extractors is given by the Left- 
over Hash Lemma first stated by Impagliazzo, Levin, and Luby [84]. This 
extractor achieves an optimal output length m = k — 21og(l/e) albeit with 
a substantially large seed length d = n. Moreover, the extractor is a linear 
function for every fixing of the seed. In its general form, the lemma states 
that any universal family of hash functions can be transformed into an ex- 
plicit extractor. The universality property required by the hash functions is 
captured by the following definition. 

Definition 2.15. A family of functions % = {hi, . . . , ho} where hi : {0, l} n — > 
{0, l} m for i = 1, . . . , D is called universal if, for every fixed choice of x, x' E 
{0, l} n such that x ^ x' and a uniformly random i G [D] := {1, . . . , D} we 
have 

Pv[hi(x) = hi(x')] < 2~ m . 

i 

One of the basic examples of universal hash families is what we call the 
linear family, defined as follows. Consider an arbitrary isomorphism ip : FJ? — > 
F2« between the vector space FrJ and the extension field F2™ , and let < m < 
n be an arbitrary integer. The linear family H\ m is the set {h a : a G ^2 n } of 
size 2 n that contains a function for each element of the extension field F2™ . 
For each a, the mapping h a is given by 

K{x) ■= (2/1,..., y m ), where (yi,...,y n ) := tp~ l (a ■ <p(x)). 

Observe that each function h a can be expressed as a linear mapping from FJ? 
to F™. Below we show that this family is pairwise independent. 

Proposition 2.16. The linear family Hwn defined above is universal. 

Proof. Let x, x' be different elements of F2«. Consider the mapping /: F2« — > 
F2 1 defined as 

f{x) := (yi,...,y m ), where (y 1 ,...,y n ) := ^(x), 

which truncates the binary representation of a field element from F2™ to m 



bits. The probability we are trying to estimate in Definition 2.15 is, for a 
uniformly random a G F2™, 

Pr [/(a • x) = /(a • x')} = Pr [/(a • (x - x')) = 0]. 

aSF 2 n a£F 2 n 

But note that x — x' is a nonzero element of F2« , and thus, for a uniformly 
random a, the random variable ax is uniformly distributed on F2«. It follows 
that 

Pr [f{a ■ (x - x')) = 0] = 2" m , 

q6F 2 h 

implying that H\\ n is a universal family. □ 
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Now we are ready to state and prove the Leftover Hash Lemma. We prove 
a straightforward generalization of the lemma which shows that universal hash 
families can be used to construct not only strong extractors, but also lossless 
condensers. 

Theorem 2.17. (Leftover Hash Lemma) Let U = {h { : Fg -> F^} ieF d be a 

universal family of hash functions with 2 d elements indexed by binary vectors 
of length d, and define the function /: Fj x Fj -^ F™ as f(x,z) := h z (x). 
Then 

1. For every k, e such that m < k — 21og(l/e), the function f is a strong 
(k,e) -extractor, and 

2. For every k,e such that m > k + 21og(l/e), the function f is a strong 
lossless (k,e) -condenser. 

In particular, by choosing H = T~L\\ n , it is possible to get explicit extractors 
and lossless condensers with seed length d = n. 

Proof. Considering Proposition |2.8[ it suffices to show the claim when X is a 
flat distribution on a support of size K := 2 k . Define M := 2 m , D := 2 d , and 
let [i be any flat distribution over F 2 +m such that supp(^) C supp(/u), and 
denote by y the distribution of (Z, f(X, Z)) over F d+m where X ~ X and 
Z ~ Ud- We will first upper bound the I2 distance of the two distributions y 
and fi, that can be expressed as follows: 



\\y-n\\l = E (y(x)-Kx)) 2 

X X 

1 2 

|supp(/x)| |supp(/x)| 



" E^) 2 + ]^7^-^TtE^ 



(2-2) = E^(^) 



,2 1 



|supp(/i)|' 



where (a) uses the fact that \i assigns probability l/|supp(/i)| to exactly 
|supp(/x)| elements of F 2 +m and zeros elsewhere. 

Now observe that y(x) 2 is the probability that two independent samples 
drawn from y turn out to be equal to x, and thus, ^ x y(x) 2 is the collision 
probability of two independent samples from y, which can be written as 

Y,y(x) 2 = zz fr xxi [(ZJ(X,Z)) = (Z'J(X',Z'))], 
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where Z, Z' ~ F2 and X, X' ~ X are independent random variables. We can 
rewrite the collision probability as 

Y^y^f = Pv[Z = Z']-Pr[f(X,Z) = f(X',Z')\Z = Z'} 

X 

= K%> (x)=MX,)| 

= i ■ (Pr[X = X'] + ± Y, K[hz(x) = hz(x')]) 

x ,x' ^Ssupp(A') 
x^=x' 

(b) 1 J. 1 ^ 1 1 M. 

x,x'£supp(A') 

where (b) uses the assumption that H is a universal hash family. Plugging 



the bound in (2.2) implies that 



1 / DM M 

Observe that both y and \i assign zero probabilities to elements of {0, \} d+m 
outside the support of \i. Thus using Cauchy-Schwarz on a domain of size 
supp(yu), the above bound implies that the statistical distance between y and 
\i is at most 



0-K\ 1 ,/ ^PPMI i_ DM u^ 

[ ' 2'V DM V |supp(/,)| + iT 

Now, for the first part of the theorem, we specialize fi to the uniform distri- 
bution on {0, l}"+ m j which has a support of size DM, and note that by the 
assumption that m < k — 21og(l/e) we will have M < e 2 K. Using (2.3), it 
follows that y and \x are (e/2)-close. 

On the other hand, for the second part of the theorem, we specialize [i to 
any flat distribution on a support of size DK containing supp(3^) (note that, 
since X is assumed to be a flat distribution, y must have a support of size 



at most DK). Since m > k + 21og(l/e), we have K = e M, and again (2.3) 



implies that y and // are (e/2)-close. D 

2.3.2 Trevisan's Extractor 

One of the most important explicit constructions of extractors is due to Tre- 
visan 1521. Since we will use this extractor at several points in the thesis, we 



dedicate this section to sketch the main ideas behind this important construc- 
tion. 
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Trevisan's extractor can be thought of as an "information-theoretic" vari- 
ation of Nisan-Wigderson's pseudorandom generator that will be discussed in 
detail in Chapter [6} For the purpose of this exposition, we will informally 
demonstrate how Nisan-Wigderson's generator works and then discuss Tre- 
visan's extractor from a coding-theoretic perspective. 

Loosely speaking, a pseudorandom generator is an efficient and determin- 
istic function (where the exact meaning of "efficient" may vary depending on 
the context) that transforms a statistically uniform distribution on d bits to 
a distribution on m bits, for some m 3> d, that "looks random" to any "re- 
stricted" distinguisher. Again the precise meaning of "looking random" and 
the exact restriction of the distinguisher may vary. In particular, we require 
the output distribution X of the pseudorandom generator to be such that, for 
every restricted distinguisher D: {0,l} m — > {0,1}, we have 



Pr \D(X) = 11 - Pr \D(Y) = 1} 

X~X Y~U m 



<e, 



where e is a negligible bias. Recall that, in in light of Proposition 2.3 this 
is very close to what we expect from the output distribution of an extrac- 
tor, except that for the case of pseudorandom generators the distinguisher D 
cannot be an arbitrary function. Indeed, when m > d, the output distribu- 
tion of a pseudorandom generator cannot be close to uniform and is always 
distinguishable by some distinguisher. The main challenge in construction 
of a pseudorandom gnerator is to exclude the possibility of such a distin- 
guisher to be included in the restricted class of functions into consideration. 
As a concrete example, one may require a pseudorandom generator to be a 
polynomial-time computable function whose output is a sequence of length 
d 2 that is indistinguishable by linear-sized Boolean circuits with a bias better 
than d~ 2 . 

Nisan and Wigderson observed that the hardness of distinguishing the 
output distribution from uniform can be derived from a hardness assumption 
that is inherent in the way the pseudorandom generator itself is computed. 
In a way, their construction shows how to "trade" computational hardness 
with pseudorandomness. In a simplified manner, a special instantiation of 
this generator can be described as follows: Suppose that a Boolean predicate 
/ : {0, l} d —7- {0, 1} is hard to compute on average by "small" Boolean circuits; 
meaning that no circuit consisting of a sufficiently small number of gates (as 
determined by a security parameter) is able to compute / substantially better 
than a trivial circuit that always outputs a constant value. Then, given a 
random seed Z G {0, l} d , the sequence (Z,f(Z)) is pseudorandom for small 
circuits. The reason can be seen by contradiction. Let us suppose that for 
some distinguisher D, we have 



Pr \D(X) = 1}- Pr \D(Y) = 11 

X~X Y~U m 



> e. 
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By the following simple proposition, such a distinguisher can be transformed 
into a predictor for the hard function /. 

Proposition 2.18. Consider predicates f : F2 ->■ F 2 and D: F^ +1 ->■ F 2 and 
suppose that 

Pr [D(X, f(X)) = 1] - Pr [D(Y) = 1] > e. 

Then, there are fixed choices of ao, ai G F 2 such that 

Pr / [D(X,a ) + a 1 = /(X)]> \ + e. 

Proof. Without loss of generality, assume that the quantity inside the absolute 
value is non-negative (otherwise, one can reason about the negation of D). 
Consider the following randomized algorithm A that, given x 6 F^, tries to 
predict f{X): Flip a random coin r £ F2. If r = 1, output r and otherwise 
output f. 

Intuitively, the algorithm A tries to make a random guess for f(X), and 
then feeds it to the distinguisher. As D is more likely to output 1 when the 
correct value of f(X) is supplied, A takes the acceptance of x as an evidence 
that the random guess r has been correct (and vice versa). The precise analysis 
can be however done as follows. 



Pr[A(X) = f(X)} = l -Pr[A(X) = f{X)\r = f{X)} + 

X,r z X,r 

\vv[A{X) = f{X)\r^f{X)\ 

Z X,r 

= \Pr[D{X,r) = l\r = f{X)} + 

Z X,r 

\lT[D{X,r) = 0\r*f{X)\ 

Z X,r 

= \vr[D{X,r) = l\r = f{X)} + 

Z X,r 

l(l-Pi[D(X,r) = l\r^f(X)}) 

Z X,r 

= l + Pr[D(X,r) = l\r = f(X)\- 

Z X,r 

l(pv[D(X,r) = l\r = f(X)}+ 

Z \X,r 

Pi[D(X,r) = l\r^f(Xy 

X,r 

= l+Pr[D(X,f(X)) = l]-Pv[D(X,r) = l] 

Z X X,r 
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1 



Therefore, by averaging, for some fixed choice of r the probability must 
remain above \ + e, implying that one of the functions D(X, 0), D(X, 1) or 
their negations must be as good a predictor for /(-X") as A is. □ 

Since the complexity of the predictor is about the same as that of the 
distinguisher D, and by assumption / cannot be computed by small circuits, 
we conclude that the outcome of the generator must be indistinguishable from 
uniform by small circuits. Nisan and Wigderson generalized this idea to ob- 
tain generators that output a long sequence of bits that is indistinguishable 
from having a uniform distribution. In order to obtain more than one pseu- 
dorandom bit from the random seed, they evaluate the hard function / on 
carefully chosen subsequences of the seed (for this to work, the input length 
of / is assumed to be substantially smaller than the seed length d) . 

An important observation in Trevisan's work is that Nisan- Wigderson's 
pseudorandom generator is a black-box construction. Namely, the generator 
merely computes the hard function / at suitably chosen points without caring 
much about how this computation is implemented. Similarly, the analysis uses 
the distinguisher D as a black-box. If / is computable in polynomial time, 
then so is the generator (assuming that it outputs polynomially many bits), 
and if / is hard against small circuits, the class of circuits of about the same 
size must be fooled by the generator. 

How can we obtain an extractor from Nisan- Wigderson's construction? Re- 
call that the output distribution of an extractor must be indistinguishable from 
uniform by all circuits, and not only small ones. Adapting Nisan- Wigderson's 
generator for this requirement means that we will need a function / that is 
hard for all circuits, something which is obviously impossible. However, this 
problem can be resolved if we take many hard functions instead of one, and 
enforce the predictor to simultaneously predict all functions with a reason- 
able bias. More precisely, statistical indistinguishability can be obtained if 
the function / is sampled from a random distribution, and that is exactly how 
Trevisan's extractor uses the supplied weak source. In particular, the extrac- 
tor regards the sequence obtained from the weak source as the truth table of 
a randomly chosen function, and then applies Nisan-Wigderson's construction 
relative to that function. 

The exact description of the extractor is given in Construction |2.1| The 
extractor assumes the existence of a suitable list-decodable code (see Ap- 
pendix [A] for the terminology) as well as a combinatorial design. Intuitively, 
a combinatorial design is a collection of subsets of a universe such that their 
pairwise intersections are small. We will study designs more closely in Chap- 
ter [4| In order to obtain a polynomial-time computable extractor, we need an 
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Given: A random sample X ~ X, where X is a distribution on 
{0, l} n with min-entropy at least k, and a uniformly distributed 
random seed Z ~ Ud of length d. Moreover, the extractor assumes a 
(| — 5, £) list-decodable binary code C of length N (a power of two) 
and size 2 n , and a combinatorial design <S := {Si, . . . , S m }, where 

— For all i G [m], <% C [d], \Si\ = log 2 N, and 

— For all 1 < i < j < m, \S{ C\ Sj\ < r. 

Output: A binary string E(X, Z) of length m. 

Construction: Denote the encoding of X under C by C(X). For each 
% 6 [m] , the subsequence of Z picked by the coordinate positions in 
Si (denoted by Z\i) is a string of length log 2 N and can be regarded 
as an integer in [N]. Let Ci{X) denote the bit at the (Z|j)th position 
of the encoding C(X). Then, 

E(X,Z):=(C 1 (X),...,C m (X)). 



Construction 2.1: Trevisan's extractor E: {0, l} n x {0, l} d -)■ {0, l} m . 



efficient construction of the underlying list-decodable code and combinatorial 
design. 

An analysis of Trevisan's construction is given by the following theorem, 



which is based on the original analysis of [152 



Theorem 2.19. Trevisan's extractor (as described in Construction 2.1) is a 
strong (k, e) -extractor provided that e > 2m5 and k > d+m2 r+1 -\-log(£/e) + 3. 



Proof. In light of Proposition 2.8 it suffices to show the claim when X is a 
flat distribution. Suppose for the sake of contradiction that the distribution of 
(Z, E(X, Z)) is not e-close to uniform. Without loss of generality, and using 



Proposition 2.3, this means that there is a distinguisher D: {0, l} m — > {0, 1} 
such that 

(2.4) Pr [D(Z, E(X, Z)) = 1] - Pr [D(Z, U) = 1] > e, 

where U = (Ui, . . . , U m ) is a sequence of uniform and independent random 
bits. Let X' C supp(Af) denote the set of inputs on the support of X that 
satisfy 

(2.5) Pv[D(Z,E(x,Z)) = 1] - Pr[D(Z,U) = 1] > f , 
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Observe that the size of X' must be at least ||supp(Af)| = e2 k l , since other- 



wise (2.4) cannot be satisfied. In the sequel, fix any x G X' . 

For i = 0, . . . , m, define a hybrid sequence Hi as the random variable 
Hi := (Z, C\(x), . . . , Ci(x), Ui+i, . . . , U m ). Thus, Hq is a uniformly random 
bit sequence and H m has the same distribution as (Z,E(x, Z)). For i G [m], 
define 

8i := Pr[D(Hi) = 1] - Pr[I>(fli_i) = 1], 

where the probability is taken over the randomness of Z and U. Now we can 
Pv[D(H m ) = 1] - Pr[D(H ) = 1] > 



rewrite (2.5) as 

e 
2' 

or equivalently, 

m 

Therefore, for some z G [m], we must have 5i > e/(2m) =: e'. Fix such an i, 
and recall that we have 

(2.6) Pr[D(Z, Ci(x), . . . , d(x), U i+1 ,..., U m ) = 1]- 

Pr[ J D(Z, Ci(ar), . . . , C 4 _i(x), ^, . . . , I7 m ) = 1] > e'. 

Now observe that there is a fixing C/j+i = Uj+i, . . . , ?7 m = u m of the random 
bits C/j+i, . . . , C/ m that preserves the above bias. In a similar way as we defined 
the subsequence Z\i, denote by Z\- { the subsequence of Z obtained by removing 
the coordinate positions of Z picked by Si. Now we note that Ci(x) depends 
only on x and Z\i and is in particular independent of Z\j. Furthermore, one 
can fix Z\i (namely, the portion of the random seed outside Si) such that the 
bias in (2.6) is preserved. In other words, there is a string z 1 G {0, l} d- l s *l 
such that 

Pr[D(Z, Ci(x),. . . ,Ci(x),Ui+i,. ..,u m ) = 1 | (Z\i) = z\- 

Pr[D(Z, d{x), . . . , Ci_i(a;), U u u i+U ...,u m ) = l\ (Z\j) = z'\ > e', 

where the randomness is now only over U% and Z\i, and all other random 
variables are fixed to their appropriate values. Now, Proposition 2.18 can 
be used to show that, under the above fixings, there is a fixed choice of bits 
ao, ai G F2 such that D can be transformed into a predictor for Cj(x); namely, 
so that 

Pr[D(Z,Ci(x),...,Ci-i(x),ao,u i+ i,...,u m )+ai = Ci(x) \ (Z\{) = z'\ > -+e . 

Since Z\i is a uniformly distributed random variable, the above probability can 
be interpreted in coding-theoretic ways as follows: By running through all the 
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N possibilities of Z\i, the predictor constructed from D can correctly recover 
the encoding C(x) at more than 5 + e ' fraction of the positions. Therefore, 
the distinguisher D can be transformed into a word w G F^ that has an 
agreement above 2 + 2m with C(x). 

Now a crucial observation is that the word w can be obtained from D 
without any knowledge of x, as long a correct "advice" string consisting of 
the appropriate fixings of i, V4+1, ■ ■ ■ ,u m ,ao,ai,z', and the truth tables of 
C\(x), . . . , Ci-i(x) as functions of Z\i are available. Here is where the small 
intersection property of the design S comes to play: Each Cj (x) (when j ^ i) 
depends on at most r of the bits in Z\i, and therefore, Cj(x) as a function of 
Z\i can be fully described by its evaluation on at most 2 r points (that can be 
much smaller than 2' Si ' = N). This means that the number of possibilities for 
the advice string is at most 

m ■ 2 m • 4 • 2 d ~ logN ■ 2 m2r = — • 2 d+m ( 2r+1 )+ 2 < 2 d+m ( 2r+1 )+ 2 =• T 

N ' 

Therefore, regardless of the choice ofxGX', there are words w\, . . . , wt G F^ 
(one for each possibility of the advice string) such that at least one (corre- 
sponding to the "correct" advice) has an agreement better than 5 + e ' with 
C(x). This, in turn, implies that there is a set X" C X' of size at least 
\X'\/T > e2 k ~ 1 /T and a fixed j G [T] such that, for every x G X", the code- 
word C(x) has an agreement better than 3 + e ' with Wj. As long as 5 < e', 
the number of such codewords can be at most £ (by the list-decodability of 
C), and we will reach to the desired contradiction (completing the proof) if 
the list size £ is small enough; specifically, if 

T ' 

which holds by the assumption of the theorem. □ 

By an appropriate choice of the underlying combinatorial design S and 
the list-decodable code C (namely, concatenation of the Reed-Solomon code 



and the Hadamard code as described in Section A.5), Trevisan [152] obtained 
a strong extractor with output length A; 1- ", for any fixed constant a > 0, 
and seed length d = 0(log (n/e)/log fc ). In a subsequent work, Raz, Reingold 
and Vadhan observed that a weaker notion of combinatorial designs suffice 
for this construction to work. Using this idea and a careful choice of the list- 
decodable code C, they managed to improve Trevisan's extractor so that it 
extracts almost the entire source entropy. Specifically, their imrpovement can 
be summarized as follows. 

Theorem 2.20. |123| For every n, k, m G IN, (m < k < n) and e > 0, there 
is an explicit strong (k,e) -extractor Tre: {0,1}™ x {0, l} d — > {0, l} m with 
d = 0(log 2 (n/e) • log(l/a)), where a := k/(m — 1) — 1 must be less than 

1/2. a 
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Given: A random sample X ~ X, where X is a distribution on F" 
with min-entropy at least k, and a uniformly distributed random 
seed Z ~ U^ q over F g . 

Output: A vector C(X, Z) of length £ over F g . 

Construction: Take any irreducible univariate polynomial 5 of de- 
gree n over F g , and interpret the input X as the coefficient vector 
of a random univariate polynomial F of degree n — 1 over ~F„. Then, 
for an integer parameter h, the output is given by 

C{X, Z) := (F{Z),F 1 (Z), ..., Ft- X {Z)), 

where we have used the shorthand Fi := F mod g. 



Construction 2.2: Guruswami-Umans-Vadhan's Condenser C : F™ xF,4 F™ 



Observe that, as long as the list-decodable code C is linear, Trevisan's 
extractor (as well as its improvement above) becomes linear as well, meaning 
that it can be described as a linear function of the weak source for every fixed 
choice of the seed. We will make crucial use of this observation at several 
points in the thesis. 

2.3.3 Guruswami-Umans-Vadhan's Condenser 

One of the important constructions of lossless condensers that we will use 
in this thesis is the coding-theoretic construction of Guruswami, Umans and 



Vadhan 78 . In this section, we discuss the construction (Construction 2.2) 



and its analysis (Theorem 2.22). 



We remark that this construction is inspired by a variation of Reed- 
Solomon codes due to Parvaresh and Vardy [118] . Specifically, for a given 
x £ F" arranging the outcomes of the condenser C(x,z) for all possibilities 
of the seed z € W q results in the encoding of the input x using a Parvaresh- 
Vardy code. Moreover, Parvaresh- Vardy codes are equipped with an efficient 
list-decoding algorithm that is implicit in the analysis of the condenser. The 
main technical part of the analysis is given by the following theorem. 



Theorem 2.21. |78 The mapping defined in Construction 2.2 is a strong 
(k, e) lossless condenser with error e := (n — l)(/i — 1)1/ 'q, provided that £ > 
k/logh (thus, under the above conditions the mapping becomes a strong 
(< k,e) -condenser as well). 
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Proof. Without loss of generality (using Proposition 2.8), assume that X is 
uniformly distributed on a subset of F™ of size K := 2 k . Let D := q — (n — 
l)(/i — 1)L Define the random variable 

Y:=(Z,F(Z),F 1 (Z),...,F l _ 1 (Z)), 

and denote by T C Wy the set that supports the distribution of Y; i.e., the 
set of vectors in F^ +1 for which Y has a nonzero probability of being assigned 
to. Our goal is to show that \T\ > DK. Combined with the second part 
of Proposition |2.14[ this will prove the theorem, since we will know that the 
distribution of (Z, C(X, Z)) has a support of size at least (1 — e)q2 k . 

Assume, for the sake of contradiction, that \T\ < DK. Then the set of 
points in T can be interpolated by a nonzero multivaraite low-degree polyno- 
mial of the form 

D-l 

Q(z, zi, . . . , z£) = J2 z% Q'i( z ^ ■ ■ • > z i), 

where each monomial z^ 1 ■ ■ ■ zl l in every Q\ has weighted degree j\ + hJ2 + 
/i 2 j3 + • • • + h i ~ 1 ji at most K — 1 < h and individual degrees less than h (this 
condition can be assured by taking ji, ■ ■ . ,ji to be the integer representation 
of an integer between and K — 1). Note that Q can be described by its DK 
coefficients, and each point on T specifies a linear constraint on their choice. 
Since the number of constraints is less than the number of unknowns, we know 
that a nonzero polynomial Q vanishes on the set T. Fix a nonzero choice of 
Q that has the lowest degree in the first variable z. This assures that if we 
write down Q as 



Q(z,z u ...,z e ) = Y^ Qj( z ) z i 



31 -31 

■ ■ z £ , 



the polynomials Qj(z) do not have common irreducible factors (otherwise we 
could divide by the common factor and contradict minimality of the degree) . 
In particular at least one of the Qj's must be nonzero modulo the irreducible 
polynomial g. 

Now consider the set 5* of univariate polynomials of degree less than n 
chosen so that 

/ e S o (Vz e F,) : (z, f(z), fi(z),..., h.^z)) G T, 

where, similarly as before, we have used the shorthand ft for (/ mod g). 
Note that, if we regard supp(A') as a set of low-degree univariate polynomials, 
by construction of the condenser this set must be contained in S. Therefore, 
to reach the desired contradiction, it suffices to show that IS"! < K. 

Let / be any polynomial in S. By the definition of S, the univariate 
polynomial Q(z, f(z), fi(z), . . . , fe-i(z)) must have q zeros (namely, all the 



£ 
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elements of F„). But the total degree of this polynomial is at most D — 1 + 
(n — l)(h — 1)£ = q — 1, and thus, the polynomial must be identically zero, 
and in particular, identically zero modulo g. Thus, we have the polynomial 
identity 

Q(z, f(z), f\z), ..., f- 1 (*)) = mod g(z), 

and by expanding the identity, that 

(Q 3 (z) mod g(z)) ■ (f{z))^{f h {z))» ■ ■ ■ (/ fci_1 (*))* = 0, 

3=Ul>—>3t) 

which simplifies to the identity 

(2.7) Yl (<&(*) m od <,(*)) • (/(^))ii+A'»+-+^- 1 = . 

3=(ju—,3l) 

Consider the degree n field extension F = F g [z]/g(z) of F g , that is iso- 
morphic to the set of Fg-polynomials of degree smaller than n. Under this 
notation, for every j let a.j G F to be the extension field element corresponding 
to the Fg-polynomial (Qj(z) mod g(z)). Recall that, by our choice of Q, at 
least one of the ay's is nonzero, and ( |2.7[ ) implies that the nonzero univariate 
F-polynomial 

Y a j z jl+j2h+ - +j ' h£ ~ 1 
j=(ji,-,je) 
has /, regarded as an element of F, as one of its zeros. The degree of this 
polynomial is less than K and thus it can have less than K zeros. Thus we 
conclude that \S\ < K and get the desired contradiction. □ 

By a careful choice of the parameters h and q in the above construction 
(roughly, h ~ (2nk/e) ' a and q ~ h 1+a for arbitrary constant a > and error 
e), Guruswami et al. derived the following corollary of the above theorem: 

Theorem 2.22. |78| For all constants a £ (0,1) and every k < n € IN, 
e > there is an explicit strong (k, e) lossless condenser with seed length 
d = (1 + 1/a) log(nk/e) + O(l) and output length m = d + (1 + ct)k. □ 

Using a straightforward observation, we slightly strengthen this result and 
show that in fact the parameters can be set up in such a way that the resulting 
lossless condenser becomes linear. Linearity of the condenser is a property that 
is particularly useful for the results obtained in Chapter [5j 

Corollary 2.23. Let p be a fixed prime power and a > be an arbitrary 
constant. Then, for parameters n S IN, k < nlogp, and e > 0, there is 
an explicit strong (< k, e) lossless condenser f : F" x {0, l} d — > F™ with 
seed length d < (1 + l/a)(\og(nk/e) + 0(1)) and output length satisfying^ 
mlogp < d + (1 + a)k. Moreover, f is a linear function (over W p ) for every 
fixed choice of the seed. 



All unsubscripted logarithms are to the base 2. 
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Proof. We set up the parameters of the condenser C given by Construction 2.2 



and apply Theorem 2.21 The range of the parameters is mostly similar to 
what chosen in the original result of Guruswami et al. [78| . 

Letting ho := (2p 2 nk/e) 1 ' a , we take h to be an integer power of p in range 
[ho,pho\. Also, let £ := [/c/log/i] so that the condition £ > k/logh required 



by Theorem 2.21 is satisfied. Finally, let qo := nh£/e and choose the field size 
q to be an integer power of p in range [qo,pqo]- 

We choose the input length of the condenser C to be equal to n. Note 
that C is defined over F g , and we need a condenser over F p . Since q is a fixed 
parameter, we can ensure that q > p (for large enough n), so that F p is a 
subfield of W q . For x G F™ and z £ {0, l} d , let y := C(x,y) £ W q , where x is 
regarded as a vector over the extension F q of F p . We define the output of the 
condenser f(x, z) to be the vector y regarded as a vector of length £ log p q over 
F p (by expanding each element of F g as a vector of length log p q over F p ). It 
can be clearly seen that / is a strong (< k, e)-condenser if C is. 



By Theorem 2.21 C is a strong lossless condenser with error upper bounded 

by 

(n-l)(h-l)£ nh£ _ 

q qo 

It remains to analyze the seed length and the output length of the condenser. 
For the output length of the condenser, we have 

mlogp = £logq < (1 + k/logh) logq < d+ k(logq)/(logh), 

where the last inequality is due to the fact that we have d = [log q~\ . Thus 
in order to show the desired upper bound on the output length, it suffices to 
show that log q < (1 + a) log h^. We have 

logg < log(pqo) = log(pnh£/e) < log/io + log(p n£/e) 

and our task is reduced to showing that p 2 n£/e < /iq = 2p 2 nk/e. But this 
bound is obviously valid by the choice of £ < 1 + kj log h. 
The seed length is d = [log q~\ for which we have 

d < logg + 1 <logg + O(l) 

< log(nh £/e) + O(l) 

< log(nh k/e) + O(l) 

< logink/e) H — log(2p nk/e) 

a 

< (l + -)(log(nA;/e) + 0(l)) 

a 

as desired. 

Since W q has a fixed characteristic, an efficient deterministic algorithm for 
representation and manipulation of the field elements is available |138| which 
implies that the condenser is polynomial-time computable and is thus explicit. 
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Moreover, since h is taken as an integer power of p and F q is an extension of 
F p , for any choice of polynomials F, F' ,G £ W q [X], subfield elements a, b G F p , 
and integer i > 0, we have 



(aF + 6F') ft = aF A + 6F m mod G, 
meaning that raising a polynomial to power h? is an F p -linear operation. 



Therefore, the mapping C that defines the condenser (Construction 2.2) is 



Fp-linear for every fixed seed. This in turn implies that the final condenser / 



is linear, as claimed. 



□ 



Guruswami et al. used the lossless condenser above as an intermediate 
building block for construction of an extractor that is optimal up to constant 
factors and extracts almost the entire source entropy. Namely, they proved 
the following result that will be useful for us in later chapters. 

Theorem 2.24. [78] For all positive integers n > k and all e > 0, there 



is an explicit strong (k, e) -extractor Ext: {0, l} n x {0, l} d 
k-2 log(l/e) - O(l) and d = logn + 0(log k ■ log(fc/e)). 



FJJ 1 with m 



□ 




Johann Sebastian Bach (1685-1750): Chorale Prelude in F minor 
BWV 639 "Ich ruf zu dir, Herr Jesu Christ". Piano transcription by 
Ferruccio Busoni (1866-1924). 



"Music is meaningless noise 
unless it touches a receiving 
mind. " 

— Paul Hindemith 



Chapter 3 



The Wiretap Channel Problem 



Suppose that Alice wants to send a message to Bob through a communication 
channel, and that the message is partially observable by an intruder. This sce- 
nario arises in various practical situations. For instance, in a packet network, 
the sequence transmitted by Alice through the channel can be fragmented 
into small packets at the source and/or along the way and different packets 
might be routed through different paths in the network in which an intruder 
may have compromised some of the intermediate routers. An example that 
is similar in spirit is furnished by transmission of a piece of information from 
multiple senders to one receiver, across different delivery media, such as satel- 
lite, wireless, and/or wired networks. Due to limited resources, a potential 
intruder may be able to observe only a fraction of the lines of transmission, 
and hence only partially observe the message. As another example, one can 
consider secure storage of data on a distributed medium that is physically 
accessible in parts by an intruder, or a sensitive file on a hard drive that 
is erased from the file system but is only partially overwritten with new or 
random information, and hence, is partially exposed to a malicious party. 

An obvious approach to solve this problem is to use a secret key to encrypt 
the information at the source. However, almost all practical cryptographic 
techniques are shown to be secure only under unproven hardness assumptions 
and the assumption that the intruder possesses bounded computational power. 
This might be undesirable in certain situations. Moreover, the key agreement 
problem has its own challenges. 

In the problem that we consider in this chapter, we assume the intruder 
to be information theoretically limited, and our goal will be to employ this 
limitation and construct a protocol that provides unconditional, information- 
theoretic security, even in the presence of a computationally unbounded ad- 
versary. 

The problem described above was first formalized by Wyner [165 and 
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Figure 3.1: The Wiretap II Problem. 



subsequently by Ozarow and Wyncr |116| as an information-theoretic problem. 
In its most basic setting, this problem is known as the wiretap II problem (the 
description given here follows from 116|): 



Consider a communication system with a source which outputs 
a sequence X = (X\, . . . ,X m ) in {0, l} m uniformly at random. 
A randomized algorithm, called the encoder, maps the output of 
the source to a binary string Y £ {0, l} n . The output of the 
encoder is then sent through a noiseless channel (called the direct 
channel) and is eventually delivered to a decoder 1 D which maps Y 
back to X. Along the way, an intruder arbitrarily picks a subset 
S C [n] := {1, . . . ,n} of size t < n, and is allowed to observe 2 
Z := Y\s (through a so-called wiretap channel), i.e., Y on the 
coordinate positions corresponding to the set S. The goal is to 
make sure that the intruder learns as little as possible about X, 
regardless of the choice of S. 



The system defined above is illustrated in Figure 3.1 The security of the 
system is defined by the following conditional entropy, known as "equivoca- 
tion" : 

A := min H(X\Z). 

S: \S\=t 

When A = H{X) = m, the intruder obtains no information about the trans- 
mitted message and we have perfect privacy in the system. Moreover, when 
A — > m as m — > oo, we call the system asymptotically perfectly private. 
These two cases correspond to what is known in the literature as "strong se- 
crecy". A weaker requirement (known as "weak secrecy") would be to have 
m — A = o{m). 



1 Ozarow and Wyner also consider the case in which the decoder errs with negligible 
probability, but we are going to consider only error-free decoders. 

2 For a vector x = (#1, £2, ■ ■ ■ , x n ) and a subset S C [n], we denote by x\s the vector of 
length \S\ that is obtained from x by removing all the coordinates Xi, i <fc S. 
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Remark 3.1. The assumption that X is sampled from a uniformly random 
source should not be confused with the fact that Alice is transmitting one 
particular message to Bob that is fixed and known to her before the trans- 
mission. In this case, the randomness of X in the model captures the a priori 
uncertainty about X for the outside world, and in particular the intruder, but 
not the transmitter. 

As an intuitive example, suppose that a random key is agreed upon be- 
tween Alice and a trusted third party, and now Alice wishes to securely send 
her particular key to Bob over a wiretapped channel. Or, assume that Alice 
wishes to send an audio stream to Bob that is encoded and compressed using 
a conventional audio encoding method. 

Furthermore, the particular choice of the distribution on X as a uniformly 
random sequence will cause no loss of generality. If the distribution of X is 
publicly known to be non-uniform, the transmitter can use a suitable source- 
coding scheme to compress the source to its entropy prior to the transmission, 
and ensure that from the intruder's point of view, X is uniformly distributed. 
On the other hand, it is also easy to see that if a protocol achieves perfect 
privacy under uniform message distribution, it achieves perfect privacy under 
any other distribution as well. 

3.1 The Formal Model 

The model that we will be considering in this chapter is motivated by the 
original wiretap channel problem but is more stringent in terms of its security 
requirements. In particular, instead of using Shannon entropy as a measure of 
uncertainty, we will rely on statistical indistinguishability which is a stronger 
measure that is more widely used in cryptography. 

Definition 3.2. Let £ be a set of size q, m and n be positive integers, and 
e, 7 > 0. A (t, e, 7) g -resilient wiretap protocol of block length n and message 
length m is a pair of functions E: T, m x {0, l} r — > T, n (the encoder) and 
D: E™ —)■ S m (the decoder) that are computable in time polynomial in m, 
such that 

(a) (Decodability) For all x G S m and all z £ {0, l} r we have D(E(x, z)) = 

x, 

(b) (Resiliency) Let X ~ W E m, R ~ U r , and Y = E(X, R). For a set S C [n] 
and w £ S' ', let Xg w denote the distribution of X conditioned on the 
event Y\s = w. Define the set of bad observations as 

B s := {w G £l 5 l | 6\st{X s , w M^) > e}, 

where dist(-, •) denotes the statistical distance between two distributions. 
Then we require that for every S C [n] of size at most t, Pr[Y|5 £ Bs] < 
7, where the probability is over the randomness of X and R. 
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The encoding of a vector x £ T, k is accomplished by choosing a vector 
Z G {0, l} r uniformly at random, and calculating E(x,Z). The quantities 
R := m/n, e, and 7 are called the rate, the error, and the leakage of the 
protocol, respectively. Moreover, we call 5 := t/n the (relative) resilience of 
the protocol. 

The decodability condition ensures that the functions E and D are a 
matching encoder/decoder pair, while the resiliency conditions ensures that 
the intruder learns almost nothing about the message from his observation. 

In our definition, the imperfection of the protocol is captured by the two 
parameters e and 7. When e = 7 = 0, the above definition coincides with the 
original wiretap channel problem for the case of perfect privacy. 

When 7 = 0, we will have a worst-case guarantee, namely, that the in- 
truder's views of the message before and after his observation are statistically 
close, regardless of the outcome of the observation. 

The protocol remains interesting even when 7 is positive but sufficiently 
small. When 7 > 0, a particular observation might potentially reveal to the 
intruder a lot of information about the message. However, a negligible 7 
will ensure that such a bad event (or leakage) happens only with negligible 
probability. 

All the constructions that we will study in this chapter achieve zero leakage 



(i.e., 7 = 0), except for the general result in Section 3.7.3 for which a nonzero 
leakage is inevitable. 

The significance of zero-leakage protocols is that they assure adaptive re- 
siliency in the weak sense introduced in |47| for exposure-resilient functions: if 
the intruder is given the encoded sequence as an oracle that he can adaptively 
query at up to t coordinates (that is, the choice of each query may depend on 
the outcome of the previous queries) , and is afterwards presented with a chal- 
lenge which is either the original message or an independent uniformly chosen 
random string, he will not be able to distinguish between the two cases. 

In general, it is straightforward to verify that our model can be used to 
solve the original wiretap II problem, with A > m(l — e — 7): 

Lemma 3.3. Suppose that (E,D) is an encoder/decoder pair as in Defini- 



tion 3.2 Then using E and D in the wiretap II problem attains an equivoca- 
tion 

A > mil — e — 7). 

Proof. Let W := Y\s be the intruder's observation, and denote by W the set 
of good observations, namely, 

W := {wGE 1 : dist(*<j iTO ,WE"0 < e}. 
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Denote by H{-) the Shannon entropy in d-ary symbols. Then we will have 
H(X\W) = Yl PHW = w)H(X\W = w) 

> Yl Pr(W = w)H(X\W 



w 



wew 



(a) _^ (b) 

> 22 Pr ( W = w)(l- e)m > (1 - 7)(1 - e)m > (1 - 7 - e)m. 

The inequality (a) follows from the definition of W' combined with Proposi- 
tion 3.30 in the appendix, and (b) by the definition of leakage parameter. □ 



Hence, we will achieve asymptotically perfect privacy when e+7 = o(l/m). 
For all the protocols that we present in this chapter this quantity will be 
superpolynomially small; that is, smaller than l/m c for every positive constant 
c (provided that m is large enough). 

3.2 Review of the Related Notions in 
Cryptography 

There are several interrelated notions in the literature on Cryptography and 
Theoretical Computer Science that are also closely related to our definition of 



the wiretap protocol (Definition 3.2). These are resilient functions (RF) and 



almost perfect resilient functions (APRF), exposure-resilient functions (ERF), 



and all-or-nothing transforms (AONT) (cf. [22,36,61,62,96, 126, 143 and [45 



for a comprehensive account of several important results in this area). 

The notion of resilient functions was introduced in [II] (and also [158| 
as the bit- extraction problem). A deterministic polynomial-time computable 
function / : {0, l} n — > {0, l} m is called t-resilient if whenever any t bits of 
the its input are arbitrarily chosen by an adversary and the rest of the bits 
are chosen uniformly at random, then the output distribution of the function 
is (close to) uniform. APRF is a stronger variation where the criterion for 
uniformity of the output distribution is defined with respect to the £oo (i.e., 
point- wise distance of distributions) rather than l\. This stronger requirement 
allows for an "adaptive security" of APRFs. 

ERFs, introduced in |22| , are similar to resilient functions except that the 
entire input is chosen uniformly at random, and the view of the adversary 
from the output remains (close to) uniform even after observing any t input 
bits of his choice. 

ERFs and resilient functions are known to be useful in a scenario similar to 
the wiretap channel problem where the two parties aim to agree on any random 
string, for example a session key (Alice generates x uniformly at random which 
she sends to Bob, and then they agree on the string f(x)). Here no control 
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on the content of the message is required, and the only goal is that at the end 
of the protocol the two parties agree on any random string that is uniform 



even conditioned on the observations of the intruder. Hence, Definition 3.2 of 
a wiretap protocol is more stringent than that of resilient functions, since it 
requires the existence and efficient computability of the encoding function E 
that provides a control over the content of the message. 

Another closely related notion is that of all-or-nothing transforms, which 



was suggested in 126 for protection of block ciphers. A randomized poly- 
nomial-time computable function /: {0, l} m —> {0, l} n , (m < n), is called a 
(statistical, non-adaptive, and secret-only) t-AONT with error e if it is effi- 
ciently invertible and for every S C [n] such that \S\ < t, and all X\,x% G 
{0, l} m we have that the two distributions f(x\)\s and f{x2)\s are e-close. 

An AONT with e = is called perfect. It is easy to see that perfectly 
private wiretap protocols are equivalent to perfect adaptive AONTs. It was 
shown in [47] that such functions can not exist (with positive, constant rate) 
when the adversary is allowed to observe more than half of the encoded bits. 
A similar result was obtained in [36] for the case of perfect linear RFs. 



As pointed out in 47 , AONTs can be used in the original scenario of 
Ozarow and Wyner's wiretap channel problem. However, the best known 
constructions of AONTs can achieve rate-resilience trade-offs that are far from 



the information-theoretic optimum (see Figure 3.2). 

While an AONT requires indistinguishability of intruder's view for every 
fixed pair (xi,X2) of messages, the relaxed notion of average-case AONT re- 
quires the expected distance of f{x\)\s and f(x2)\s to be at most e for a 
uniform random message pair. Hence, for a negligible e, the distance will be 
negligible for all but a negligible fraction of message pairs. Up to a loss in 
parameters, wiretap protocols are equivalent to average case AONTs: 

Lemma 3.4. Let (E, D) be an encoding/ decoding pair for a (t, e, j)2-resilient 
wiretap protocol. Then E is an average-case t-AONT with error at most 

2(e + 7 )- 

Conversely, an average-case t-AONT with error r] 2 can be used as a (t, 77, 77)- 

resilient wiretap encoder. 



Proof. Consider a (t, e,7)2-resilient wiretap protocol as in Definition 3.2, and 
accordingly, let the random variable Y = E(X, R) denote the encoding of X 
with a random seed R. For a set S C [n] of size at most t, denote by W := Y\s 
the intruder's observation. 

The resiliency condition implies that, the set of bad observations B$ has a 
probability mass of at most 7 and hence, the expected distance dist(X|W, X) 
taken over the distribution of W is at most e + 7. Now we can apply Propo- 
sition 3.31 to the jointly distributed pair of random variables (W,X), and 
conclude that the expected distance dist(W|X, W) over the distribution of 
X (which is uniform) is at most e + 7. This implies that the encoder is an 
average-case t-AONT with error at most 2(e + 7). 
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Conversely, the same argument combined with Markov's bound shows that 
an average-case t-AONT with error rf can be seen as (t, 77, r?)-resilient wiretap 
protocol. □ 

Note that the converse direction does not guarantee zero leakage, and 
hence, zero leakage wiretap protocols are in general stronger than average- 
case AONTs. An average-case to worst-case reduction for AONTs was shown 
in [22] which, combined with the above lemma, can be used to show that any 
wiretap protocol can be used to construct an AONT (at the cost of a rate 
loss) . 



A simple universal transformation was proposed in 22 to obtain an AONT 
from any ERF, by one-time padding the message with a random string ob- 
tained from the ERF. In particular, given an ERF /: {0,1}™ — > {0, l} m , 
the AONT g: {0, l} m -► {0, l} m+n is defined as g(x) := (r,x + f(r)), where 
r 6 {0, l} n is chosen uniformly at random. Hence, the ERF is used to one-time 
pad the message with a random secret string. 

This construction can also yield a wiretap protocol with zero leakage. How- 
ever, it has the drawback of significantly weakening the rate-resilience trade- 
off. Namely, even if an information theoretically optimal ERF is used in this 
reduction, the resulting wiretap protocol will only achieve half the optimal 



rate (see Figure 3.2). This is because the one-time padding strategy necessar- 
ily requires a random seed that is at least as long as the message itself, even 
if the intruder is restricted to observe only a small fraction of the transmitted 
sequence. Hence the rate of the resulting AONT cannot exceed 1/2, and it is 
not clear how to improve this universal transformation to obtain a worst-case 
AONT using a shorter seed. 

The main focus of this chapter is on asymptotic trade-offs between the rate 
R and the resilience 5 of an asymptotically perfectly private wiretap protocol. 
For applications in cryptography, e.g., the context of ERFs or AONTs, it is 
typically assumed that the adversary learns all but a small number of the 
bits in the encoded sequence, and the incurred blow-up in the encoding is not 
as crucially important, as long as it remains within a reasonable range. On 
the other hand, as in this chapter we are motivated by the wiretap channel 
problem which is a communication problem, optimizing the transmission rate 
will be the most important concern for us. We will focus on the case where 
the fraction 5 of the symbols observed by the intruder is an arbitrary constant 
below 1, which is the most interesting range in our context. However, some 
of our constructions work for sub-constant 1 — 5 as well. 



Following 1161, it is easy to see that, for resilience 5, an information- 



theoretic bound R < 1 — 5 + o(l) must hold. Lower bounds for R in terms of 
5 have been studied by a number of researchers. 

For the case of perfect privacy (where the equivocation A is equal to the 
message length m), Ozarow and Wyner |116| give a construction of a wiretap 
protocol using linear error-correcting codes, and show that the existence of an 



3.2. REVIEW OF THE RELATED NOTIONS IN CRYPTOGRAPHY 47 





Figure 3.2: A comparison of the rate vs. resilience trade-offs achieved by the 
wiretap protocols for the binary alphabet (left) and larger alphabets (right, 
in this example of size 64). (1) Information-theoretic bound, attained by 
Theorem 3.25; (2) The bound approached by [96]; (3) Protocol based on 
best non-explicit binary linear codes [68 i| 157| ; (4) AONT construction of (22], 



assuming that the underlying ERF is optimal; (5) Random walk protocol of 



Corollary |3.19| (6) Protocol based on the best known explicit 154 and non- 
explicit [68||157| linear codes. 



[n, k, d] q -code implies the existence of a perfectly private, (d — 1, 0, 0) g -resilient 
wiretap protocol with message length k and block length n (thus, rate k/n). 
As a result, the so-called Gilbert- Varshamov bound on the rate-distance 
trade-offs of linear codes (see Chapter [6]) implies that, asymptotically, R > 
1 — hg(5), where h q is the g-ary entropy function defined as 



h q (x) := x\og Jq - 1) - xlog„ 



1 



log,(l 



If q > 49 is a square, the bound can be further improved to R > l — 5—l/{y/q — 
1) using Goppa's algebraic-geometric codes [72 154 . In these protocols, the 



encoder can be seen as an adaptively secure, perfect AONTs and the decoder 

is an adaptive perfect RF. 

Moving away from perfect to asymptotically perfect privacy, it was shown 
96 1 that for any 7 > there exist binary asymptotically perfectly private 



m 



wiretap protocols with R > 1 — 25 — 7 an d exponentially small error 3 . This 
bound strictly improves the coding-theoretic bound of Ozarow and Wyner for 
the binary alphabet. 



3 Actually, what is proved in this paper is the existence of i-resilient functions which 
correspond to decoders in our wiretap setting; however, it can be shown that these functions 
also possess efficient encoders, so that it is possible to construct wiretap protocols from 
them. 
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3.3 Symbol-Fixing and Affine Extractors 

Two central notions for our constructions of wiretap protocols in this chapter 
are symbol-fixing and affine extractors. In this section, we introduce these 
notions, and study some basic constructions. 

Definition 3.5. A <i-ary symbol-fixing source is an imperfect source of random 
symbols from an alphabet of size d, that may fix some bounded number of the 
symbols to unknown values. More precisely, an (n, k)d symbol-fixing source is 
the distribution of a random variable X = (X±, X2, ■ ■ ■ , X n ) £ S n , for some 
set £ of size d, in which at least k of the coordinates (chosen arbitrarily) are 
uniformly and independently distributed on £ and the rest take deterministic 
values. 

When d = 2, we will have a binary symbol-fixing source, or simply a bit- 
fixing source. In this case £ = {0, 1}, and the subscript d is dropped from the 
notation. 

The min-entropy of a (n, k)d symbol-fixing source is k log 2 d bits. For a 
d-ary source with d 7^ 2, it is more convenient to talk about the <i-ary entropy 
of the source, which is k (in d-ary symbols). 

Affine sources are natural generalizations of symbol-fixing sources when 
the alphabet size is a prime power. 

Definition 3.6. For a prime power q, an (n, k) q affine source is a distri- 
bution on F™ that is uniformly supported on an affine translation of some 
fc-dimensional subspace of F™. 

It is easy to see that the q-ary min-entropy of a /c-dimensional affine source 
is k. Due to the restricted structure of symbol-fixing and affine sources, it is 
possible to construct seedless extractors for such sources: 

Definition 3.7. Let £ be a finite alphabet of size d > 1. A function /: £ ra — > 
£ m is a (seedless) (k, e)-extractor for symbol-fixing (resp., affine) sources on £ n 
if for every (n, k)^ symbol-fixing (resp., affine) source X, the distribution E(X) 
is e-close to the uniform distribution Uj]m . The extractor is called explicit if 
it is deterministic and polynomial-time computable. 

We will shortly see simple constructions of zero-error, symbol-fixing and 
affine extractors using linear functions arising from good error-correcting codes. 
These extractors achieve the lowest possible error, but however are unable to 
extract the entire source entropy. Moreover, the affine extractor only works for 
a "restricted" class of affine sources. For unrestricted affine sources, there are 
by now various constructions of extractors in the literature. Here we review 
some notable examples that are most useful for the construction of wiretap 
protocols that we will discuss in this chapter. 

Over large fields, the following affine extractor due to Gabizon and Raz 
extract almost the entire source entropy: 
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Theorem 3.8. 65 There is a constant qo such that for any prime power field 
size q and integers n, k such that q > max{go, n 20 }, there is an explicit affine 
(k, e)-extractor f : F™ — > F^ -1 , where e < q~ 1 / 21 . □ 

In this construction, the field size has to be polynomially large in n. When 
the field size is small (in particular, constant), the task becomes much more 
challenging. The most challenging case thus corresponds to the binary field 
F2, for which an explicit affine extractor was obtained, when the input entropy 
is a constant fraction of the input length, by Bourgain: 



Theorem 3.9. 16 For every constant < 6 < 1, there is an explicit affine 
extractor AExt: FJ? —> F™ for min-entropy 5n with output length m = £l(n) 
and error at most 2~ n ( m ' . □ 

Bourgain's construction was recently simplified, improved, and extended 



to work for arbitrary prime fields by Yehudayoff |167 



An "intermediate" trade-off is recently obtained by DeVos and Gabizon 
44 , albeit with a short output length. This explicit construction extracts one 
unbiased bit from any (n,k) q affine source provided that, for d := 5n/k, we 
have q > 2d 2 and the characteristic of the field is larger than d. 

3.3.1 Symbol-Fixing Extractors from Linear Codes 

The simple theorem below states that linear error-correcting codes can be used 
to obtain symbol-fixing extractors with zero error. 

Theorem 3.10. Let C be an [n, k, d] q code over W q and G be a k x n generator 
matrix ofC. Then, the function E: F™ — > F^ defined as 4 E(x) := Gx T is an 
(n — d + 1, 0)-extractor for symbol-hxing sources over F ? . 

Conversely, if a linear function E : F" — > F^ is an (n — d + 1, 0) -extractor 
for symbol-hxing sources over F 9 , it corresponds to a generator matrix of an 
[n, k, d] q code. 

Proof. Let X be a symbol- fixing source with a set S C [n] of fixed coordinates, 
where 5 |5| = d — 1, and define S := [n] \ S. Observe that, by the Singleton 
bound, we must have \S\ = n — d + \> k. 

The submatrix of G obtained by removing the columns picked by S must 
have rank k. Since otherwise, the left kernel of this submatrix would be 
nonzero, meaning that C has a nonzero codeword that consists of entirely 
zeros at the d— 1 positions picked by S, contradicting the assumption that the 
minimum distance of C is d. Therefore, the distribution E{X) is supported on 
a fc-dimensional affine space on F^, meaning that this distribution is uniform. 



4 We typically consider vectors be represented in row form, and use the transpose operator 
(a; T ) to represent column vectors. 

5 If the set of fixed symbols if of size smaller than d — 1, the argument still goes through 
by taking S as an arbitrary set of size d — 1 containing all the fixed coordinates. 
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The converse is straightforward by following the same argument. □ 

If the field size is large enough; e.g., q > n, then one can pick C in the above 
theorem to be an MDS code (in particular, a Reed-Solomon code) to obtain a 
(k, 0)-extractor for all symbol-fixing sources of entropy k with optimal output 
length k. However, for a fixed q, negative results on the rate-distance trade- 
offs of codes (e.g., Hamming, MRRW, and Plotkin bounds) assert that this 
construction of extractors must inevitably lose some fraction of the entropy of 
the source. Moreover, the construction would at best be able to extract some 
constant fraction of the source entropy only if the entropy of the source (in 
q-ary symbols) is above n/q. 

3.3.2 Restricted Affine Extractors from Rank-Metric Codes 



In Section 3.7, we will see that affine extractors can be used to construct 
wiretap schemes for models that are more general than the original Wiretap II 
problem, e.g., when the direct channel is noisy. For these applications, the 
extractor needs to additionally have a nice structure that is in particular 
offered by linear functions. 

An obvious observation is that a nontrivial affine extractor cannot be a 
linear function. Indeed, a linear function f(x) := (a,x) + /3, where a,/3,x £ 
F™, is constant on the (n — l)-dimensional orthogonal subspace of a, and thus, 
fails to be an extractor for even (n — l)-dimensional affine spaces. However, 
in this section we will see that linear affine extractors can be constructed if 
the affine source is known to be described by a set of linear constraints whose 
coefficients lie on a small sub- field of the underlying field. Such restricted 
extractors turn out to be sufficient for some of the applications that we will 
consider. 

Let Q be a prime power. Same as linear codes, an affine subspace on 
Fq can be represented by a generator matrix, or parity-check matrix and 
a constant shift. That is, a /c-dimensional affine subspace A C WX can be 
described as the image of a linear mapping 

A:={xG + P:x£F k Q }, 

where G is a k x n generator matrix of rank k over Wq, and (3 G F^ is a 
fixed vector. Alternatively, A can be expressed as the translated null-space of 
a linear mapping 

A:={x + (3e¥%: Hx T =0}, 

for an (n — k) x n parity check matrix of rank n — k over Fq. 

Observe that a symbol-fixing source over W q with q-ary min-entropy k can 
be seen as a fc-dimensional affine source with a generator matrix of the form 
[/ | 0] • P, where I is the k x k identity matrix, denotes the k x (n — k) all- 
zeros matrix, and P is a permutation matrix. Recall that from Theorem |3. 10 
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we know that for this restricted type of affine sources linear extractors exist. 
In this section we generalize this idea. 

Suppose that Q = q m for a prime power q so that Fq can be regarded 
as a degree m extension of W q (and isomorphic to W q m). Let A be an affine 
source over Fq. We will call the affine source W q -restricted if its support can 
be represented by a generator matrix (or equivalently, a parity check matrix) 
over Fq. 

In this section we introduce an affine extractor that is Fg-linear and, 
assuming that m is sufficiently large, extracts from F^-restricted affine sources. 



The construction of the extractor is similar to Theorem 3.10, except that 
instead of an error-correcting code defined over the Hamming metric, we will 
use rank-metric codes. 

Consider the function rdist: F™ xn x F™ xn ->■ Z, where F™ xra denotes the 
set ofmxn matrices over F q , defined as rdist(A, B) := rank (? ( J 4 — B), where 
rank g is the matrix rank over Wq. It is straightforward to see that rdist is a 
metric. 

The usual notion of error-correcting codes defined under the Hamming 
metric can be naturally extended to the rank metric. In particular, a rank- 
metric code C can be defined as a set ofmxn matrices (known as codewords) , 
whose minimum distance is the minimum rank distance between pairs of code- 
words. 

For Q := q m , there is a natural correspondence between mxn matrices over 
Wq and vectors of length n over Fq . Consider an isomorphism ip : Wq — > F™ 
between Fq and W™ which maps elements of Fq to column vectors of length 
m over W q . Then one can define a mapping <I>: Fq —> F™ xn defined as 

$(xi, . . . , x n ) := [<p(xi) | • • • | <p(x n )] 

to put the elements of Fq in one-to-one correspondence with mxn matrices 
over Wq. 

A particular class of rank-metric codes are linear ones. Suppose that C 
is a linear [n, k, c(\q code over Fq. Then, using <&(•), C can be regarded as a 
rank-metric code of dimension k over F^ xn . In symbols, we will denote such 
a linear /c-dimensional rank-metric code as an [[n, k, d]] q m code, where d is the 
minimum rank-distance of the code. The rank-distance of a linear rank-metric 
code turns out to be equal to the minimum rank of its nonzero codewords and 
obviously, one must have d < d. However, the Hamming distance of C might 
turn out to be much larger than its rank distance when regarded as a rank- 
metric code. In particular, d < m, and thus, d must be strictly smaller than 
d when the degree m of the field extension is less than d. 

A counterpart of the Singleton bound in the rank- metric states that, for 
any [[n, k, d]]qm code, one must have d < n — k + 1. Rank- metric codes that 
attain equality exist and are called maximum rank distance (MRD) codes. 
A class of linear rank-metric codes known as Gabidulin codes [641 are MRD 
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and can be thought of as the counterpart of Reed- Solomon codes in the rank 
metric. In particular, the codewords of a Gabidulin code, seen as vectors 
over the extension field, are evaluation vectors of bounded-degree linearized 
polynomials rather than arbitrary polynomials as in the case of Reed-Solomon 
codes. These codes are defined for any choice of n, k, q, m as long as m > n 
and k < n. 



The following is an extension of Theorem 3.10 to restricted affine sources 



Theorem 3.11. Let C be an [[n,k,d]] q m code defined from a code over Wq 
(where Q := q m ) with a generator matrix G G Fg Xn . Then the function 

E: Fq — > Wq defined as E(x) := Gx T is an (n — d + 1, 0)-extractor for W q - 
restricted affine sources over Wq. 

Conversely, if a linear function E : Wq — > Wq is an (n— d+1, 0)-extractor for 
all W q -restricted affine sources over Wq, it corresponds to a generator matrix 
of an [[n,k,d]] q ™ code. 

Proof. Consider a restricted affine source X uniformly supported on an affine 
subspace of dimension 6 n — d + 1 

X:={xA + l3:x^W n Q - d+1 }, 

where A G W q n has rank n — d + 1, and /? E Wq is a fixed translation. 

Note that k < n — d + 1 by the Singleton bound for rank-metric codes. 

The output of the extractor is thus uniformly supported on the affine 
subspace 

B := {GA T x T + G[3 T : x € F™^ +1 } C W Q . 

Note that GA T G Wq . Our goal is to show that the dimension 

of B is equal to k. Suppose not, then we must have rankg(Gj4 T ) < k. In 
particular, there is a nonzero y G F^ such that yGA T = 0. 

Let Y := &(yG) G FJ? xn , where $(•) is the isomorphism that maps code- 
words of C to their matrix form over W q . By the distance of C, we know that 
rankq(y) > d. Since m> d, this means that Y has at least d linearly indepen- 
dent rows. On the other hand, we know that the matrix YA T G W q 
is the zero matrix. Therefore, Y has d independent rows (each in F") that are 
all orthogonal to the n — d+1 independent rows of A. Since d+(n — d+l) > n, 
this is a contradiction. 

Therefore, the dimension of B is exactly k, meaning that the output dis- 
tribution of the extractor is indeed uniform. The converse is straightforward 
by following a similar line of argument. □ 



The argument still holds if the dimension of X is more than n — d+1. 
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Thus, in particular, we see that generator matrices of MRD codes can be 
used to construct linear extractors for restricted affine sources that extract 
the entire source entropy with zero error. This is possible provided that the 
field size is large enough compared to the field size required to describe the 
generator matrix of the affine source. Using Gabidulin's rank metric codes, 



we immediately obtain the following corollary of Theorem 3.11 



Corollary 3.12. Let q be a prime power. Then for every positive integer 
n, k < n, and Q := q n , there is a linear function f: Wq — > F^ that is a 
(k, 0)-extractor for W q -restricted affine sources over Fq. □ 



It can be shown using similar proofs that if, in Theorems 3.10 and 3.11 
a parity check matrix of the code is used instead of a generator matrix, the 
resulting linear function would become a lossless (d — l,0)-condenser rather 
than an extractor. This is in fact part of a more general "duality" phenomenon 
that is discussed in Section [BT5l 



3.4 Inverting Extractors 

In this section we will introduce the notion of invertible extractors and its con- 
nection with wiretap protocols 7 . Later we will use this connection to construct 
wiretap protocols with good rate-resilience trade-offs. 

Definition 3.13. Let £ be a finite alphabet and / be a mapping from S n to 
S m . For 7 > 0, a function A: S m x {0, l} r — > E n is called a ^-inverter for / 
if the following conditions hold: 

(a) (Inversion) Given x S T, m such that f~ 1 (x) is nonempty, for every z £ 
{0, l} r we have f(A(x, z)) = x. 

(b) (Uniformity) A(UY,™Mr) ~ 7 ^s n - 

A 7-inverter is called efficient if there is a randomized algorithm that runs 
in worst case polynomial time and, given x £ S m and z as a random seed, 
computes A(x,z). We call a mapping ^-invertible if it has an efficient 7- 
inverter, and drop the prefix 7 from the notation when it is zero. 

The parameter r in the above definition captures the amount of random 
bits that the inverter (seen as a randomized algorithm) needs to receive. For 



7 Another notion of invertible extractors was introduced in [46] and used in 48 for 
a different application (entropic security) that should not be confused with the one we 
use. Their notion applies to seeded extractors with long seeds that are efficiently invertible 
bijections for every fixed seed. Such extractors can be seen as a single-step walk on highly 
expanding graphs that mix in one step. This is in a way similar to the multiple-step random 
walk used in the seedless extractor of section [375] that can be regarded as a single-step walk 
on the expander graph raised to a certain power. 
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our applications, no particular care is needed to optimize this parameter and, 
as long as r is polynomially bounded in n, it is generally ignored. 

Remark 3.14. If a function / maps the uniform distribution to a distribu- 
tion that is e-close to uniform (as is the case for all extractors), then any 
randomized mapping that maps its input x to a distribution that is 7-close to 
the uniform distribution on f~ l (x) is easily seen to be an (e + 7)-inverter for 
/. In some situations designing such a function might be easier than directly 
following the above definition. 



The idea of random pre-image sampling was proposed in 1 47 for construc- 
tion of adaptive AONTs from APRFs. However, they ignored the efficiency 
of the inversion, as their goal was to show the existence of (not necessarily 
efficient) information-theoretically optimal adaptive AONTs. Moreover, the 
strong notion of APRF and a perfectly uniform sampler is necessary for their 
construction of AONTs. As wiretap protocols are weaker than (worst-case) 
AONTs, they can be constructed from slightly imperfect inverters as shown 
by the following lemma. 

Lemma 3.15. Let £ be an alphabet of size q > 1 and f: E n —> T, m be a 

(j 2 /2)-invertible q-ary (k, e) symbol-fixing extractor. Then, f and its inverter 
can be seen as a decoder/encoder pair for an (n — k, e + 7, ^) q -resilient wiretap 
protocol with block length n and message length m. 

Proof. Let E and D denote the wiretap encoder and decoder, respectively. 
Hence, E is the (7 2 /2)-inverter for /, and D is the extractor / itself. From 
the definition of the inverter, for every x E T, m and every random seed r, we 
have D(E(x,r)) = x. Hence it is sufficient to show that the pair satisfies the 
resiliency condition. 

Let the random variable X be uniformly distributed on T, m and the seed 
R E {0, l} r be chosen uniformly at random. Denote the encoding of X by 
Y := E(X, R). Fix any S C [n] of size at most n — k. 

For every w G Y^' s ', let Y w denote the set {y E S n : (y\s) = w}. Note that 
the sets Y w partition the space S n into \T,\' S ' disjoint sets. 

Let y and 3^5 denote the distribution of Y and Y\s, respectively. The 
inverter guarantees that y is (7 2 /2)-close to uniform. Applying Proposi- 



tion 3.32 we get that 



Y, Pr[C*1s) = M ■ d\st{{y\Y w )Mv w ) < I 2 - 

The left hand side is the expectation of 6\st{{y\Y w ) ,Uy w ) ■ Denote by W the 
set of all bad outcomes of Y\s, i.e., 

W := {w E £l s l I dist{(y \Y w ),U Yw ) > 7}- 
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By Markov's inequality, we conclude that 

Pr[(Y| s ) G W] < 7. 

For every w G W, the distribution of Y conditioned on the event (l^ls) = w 
is 7-close to a symbol- fixing source with n — \S\ > k random symbols. The 
fact that D is a symbol-fixing extractor for this entropy and Proposition 3.33 
imply that, for any such w, the conditional distribution of D(y)|(y|5' = w) is 
(7 + e)-close to uniform. Hence with probability at least 1 — 7 the distribution 
of X conditioned on the outcome of Y\s is (7 + e)-close to uniform. This 
ensures the resiliency of the protocol. □ 



By combining Lemma 3.15 and Theorem 3.10 using a Reed-Solomon code, 
we can obtain a perfectly private, rate-optimal, wiretap protocol for the Wire- 
tap II problem over large alphabets (namely, q > n), and recover the original 



result of Ozarow and Wyner 116 



Corollary 3.16. For every positive integer n, prime power q > n, and 5 G 
[0, 1), there is a (Sn, 0, 0) q -resilient wiretap protocol with block length n and 
rate 1 — 5 that attains perfect privacy. □ 

3.5 A Wiretap Protocol Based on Random Walks 

In this section we describe a wiretap protocol that achieves a rate R within 
a constant fraction of the information theoretically optimal value 1 — 5 (the 
constant depending on the alphabet size). 

To achieve our result, we will modify the symbol-fixing extractor of Kamp 
and Zuckerman |88|, that is based on random walks on expander graphs, to 
make it efficiently invertible without affecting its extraction properties, and 
then apply Lemma |3. 15 above to obtain the desired wiretap protocol. 



Before we proceed, let us briefly review some basic notions and facts related 
to expander graphs. For a detailed review of the theory of expander graphs, 
refer to the excellent survey by Hoory, Linial and Wigderson [821, and books 



109 112 



We will be working with directed regular expander graphs that are ob- 
tained from undirected graphs by replacing each undirected edge with two 
directed edges in opposite directions. Let G = (V, E) be a d-regular graph. 
Then a labeling of the edges of G is a function L : V x [d] — > V such that for 
every u £ V and t G [d], the edge (u, L(u, t)) is in E. The labeling is consistent 
if whenever L(u,t) = L(v,t), then u = v. Note that the natural labeling of a 
Cay ley graph (cf. [82]) is in fact consistent. 

A family of <i-regular graphs is an infinite set of d-regular graphs such that 
for every iV G IN, the set contains a graph with at least ./V vertices. For a 



8 In fact, Ozarow and Wyner use a parity check matrix of an MDS code in their con- 
struction, which is indeed a generator matrix for the dual code which is itself MDS. 
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parameter c > 1, we will call a family c-dense if there is an Nq E IN such that, 
for every N > Nq, the family has a graph with at least N and at most cN 
vertices. We call a family of graphs constructible if all the graphs in the family 
have a consistent labeling that is efficiently computable. That is, there is a 
uniform, polynomial-time algorithm that, given N E IN and i E [N],j E [d], 
outputs the label of the jth neighbor of the ith. vertex, under a consistent 
labeling, in the graph in the family that has N vertices (provided that it 
exists). 

Let A denote the normalized adjacency matrix of a d- regular graph G 
(that is, the adjacency matrix with all the entries divided by d). We denote 
by \q the second largest eigenvalue of A in absolute value. The spectral 
gap of G is given by 1 — Ac- Starting from a probability distribution p on 
the set of vertices, represented as a real vector with coordinates index by the 
vertex set, performing a single-step random walk on G leads to the distribution 
defined by pA. The following is a well known lemma on the convergence of 
the distributions resulting from random walks (see |99| for a proof): 



Lemma 3.17. Let G = (V,E) be a d-regular undirected graph, and A be its 
normalized adjacency matrix. Then for any probability vector p, we have 

\\pA-Uvh <*G\\p-Uvh, 

where \\ ■ ||2 denotes the £2 norm. □ 

The extractor of Kamp and Zuckerman |88| starts with a fixed vertex in 
a large expander graph and interprets the input as the description of a walk 
on the graph. Then it outputs the label of the vertex reached at the end of 
the walk. Notice that a direct approach to invert this function will amount to 
sampling a path of a particular length between a pair of vertices in the graph, 
uniformly among all the possibilities, which might be a difficult problem for 
good families of expander graphs 9 . We work around this problem by choosing 
the starting point of the walk from the input 10 . The price that we pay by doing 
so is a slightly larger error compared to the original construction of Kamp and 
Zuckerman that is, asymptotically, of little significance. In particular we show 
the following: 

Theorem 3.18. Let G be a constructible d-regular graph with d m vertices and 
second largest eigenvalue Xg > 1/Vo. Then there exists an explicit invertible 



9 In fact intractability of the easier problem of finding a loop in certain families of ex- 
pander graphs forms the underlying basis for a class of cryptographic hash functions (cf. 



25 ) . Even though this easier problem has been solved in |151| , uniform sampling of paths 



seems to be much more difficult. 

10 The idea of choosing the starting point of the walk from the input sequence has been 



used before in extractor constructions 172 , but in the context of seeded extractors for 
general sources with high entropy. 
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Figure 3.3: The random-walk symbol-fixing extractor. 
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Proof. We first describe the extractor and its inverse. Given an input (v, w) G 
[d] m x [(i] n ~ m , the function SFExt interprets v as a vertex of G and w as the 
description of a walk starting from v. The output is the index of the vertex 
reached at the end of the walk. Figure pT3| depicts the procedure. The 4-regular 
graph shown in this toy example has 8 vertices labeled with binary sequences 
of length 3. Edges of the graph are consistently labeled at both endpoints 
with the set of labels {1,2,3,4}. The input sequence (0,1,0 | 2,3,4,2,4) 
shown below the graph describes a walk starting from the vertex 010 and 
following the path shown by the solid arrows. The output of the extractor is 
the label of the final vertex Oil. 

The inverter Inv works as follows: Given x £ [d] m , x is interpreted as a 
vertex of G. Then Inv picks W S [ci] n ~ m uniformly at random. Let V be 
the vertex starting from which the walk described by W ends up in x. The 
inverter outputs (V, W). It is easy to verify that Inv satisfies the properties of 
a 0-inverter. 

Now we show that SFExt is an extractor with the given parameters. We 
will follow the same line of argument as in the original proof of Kamp and 
Zuckerman. Let (x, w) E [d] m x [d] n_m be a vector sampled from an (n, k)d 
symbol-fixing source, and let u := SFExt(x,w). Recall that u can be seen as 
the vertex of G reached at the end of the walk described by w starting from 
x. Let pi denote the probability vector corresponding to the walk right after 
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the zth step, for i = 0, . . . , n — m, and denote by p the uniform probability 
vector on the vertices of G. Our goal is to bound the error e of the extractor, 
which is half the i\ norm of p n -m — P- 

Suppose that x contains k\ random symbols and the remaining &2 := k — k\ 
random symbols are in w. Then j>o has the value d~ kl at d kl of the coordinates 
and zeros elsewhere, hence 

||po -p||! = d kl (d- kl - d- m f + (d m - d kl )d- 2m = d~ kl - cT m < d~ kl . 

Now for each i £ [n — ml, if the ith. step of the walk corresponds to a 



random symbol in w the £2 distance is multiplied by Xq by Lemma 3.17 
Otherwise the distance remains the same due to the fact that the labeling of 
G is consistent. Hence we obtain ||p n _ m — p\\% < d~ kl \ G 2 . Translating this 
into the t\ norm by using the Cauchy-Schwarz inequality, we obtain e, namely, 

e < Zd(. m - k l)/ 2 \ k ^ < 2(( m ~ fcl ) lo 6 a! + fc 2logA2,)/2 

~ 2 G 

By our assumption, Ag > 1/y/d. Hence, everything but k\ and ^2 being fixed, 
the above bound is maximized when k\ is minimized. When k < n — m, this 
corresponds to the case k\ = 0, and otherwise to the case k\ = k — n + m. 
This gives us the desired upper bound on e. □ 



Combining this with Lemma 3.15| and setting up the the right asymptotic 



parameters, we obtain our protocol for the wiretap channel problem. 

Corollary 3.19. Let 5 £ [0, 1) and 7 > be arbitrary constants, and suppose 
that there is a constructible family of d-regular expander graphs with spectral 
gap at ./east 1 — A that is c-dense, for constants A < 1 and c > 1. 

Then, for every large enough n, there is a (5n, 2 _ "("), 0) ^-resilient wiretap 
protocol with block length n and rate 

R = max{a(l — 5), 1 — 8/ct} — 7, 

where a := — log rf A 2 . 



Proof. For the case c = 1 we use Lemma 3.15 with the extractor SFExt of 



Theorem 3.18| a nd it s inverse. Every infinite family of graphs must satisfy 



A > 2y/d — 1/d 114 , and in particular we have A > 1/y/d, as required by 



Theorem |3.18| We choose the parameters k := (1— 5)n and m := n(max{a(l — 
5),1 — 5/a} — 7), which gives s = — O(n), and hence, exponentially small 
error. The case c > 1 is similar, but involves technicalities for dealing with 
lack of graphs of arbitrary size in the family. We will elaborate on this in 



Appendix |3.A| □ 

Using explicit constructions of Ramanujan graphs that achieve 

A < 2Vd-l/d 
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when d — 1 is a prime power [100 110 119], one can obtain a > 1 — 2/logd, 



which can be made arbitrarily close to one (hence, making the protocol arbi- 
trarily close to the optimal bound) by choosing a suitable alphabet size that 
does not depend on n. Namely, we have the following result: 

Corollary 3.20. Let 5 € [0, 1) and 7 > be arbitrary constants. Then, there 
is a positive integer d only depending on 7 such that the following holds: For 
every large enough n, there is a (<5n, 2 ( n ', 0)d-resilient wiretap protocol with 
block length n and rate at least 1 — 5 — 7. □ 

3.6 Invertible Affine Extractors and 

Asymptotically Optimal Wiretap Protocols 

In this section we will construct a black box transformation for making certain 
seedless extractors invertible. The method is described in detail for affine 
extractors, and leads to wiretap protocols with asymptotically optimal rate- 
resilience trade-offs. Being based on affine extractors, these protocols are only 
defined for prime power alphabet sizes. On the other hand, the random- 



walk based protocol discussed in Section |3.5| can be potentially instantiated 
for an arbitrary alphabet size, though achieving asymptotically sub-optimal 
parameters (and a positive rate only for an alphabet of size 3 or more). 

Modulo some minor differences, the construction can be simply described 
as follows: A seedless affine extractor is first used to extract a small number 
of uniform random bits from the source, and the resulting sequence is then 
used as the seed for a seeded extractor that extracts almost the entire entropy 
of the source. 

Of course, seeded extractors in general are not guaranteed to work if (as 
in the above construction) their seed is not independent from the source. 
However, as observed by Gabizon and Raz [65], a linear seeded extractor can 
extract from an affine source if the seed is the outcome of an affine extractor on 



the source. This idea was formalized in a more general setting by Shaltiel 132]. 

Shaltiel's result gives a general framework for transforming any seedless 
extractor (for a family of sources satisfying a certain closedness condition) 
with short output length to one with an almost optimal output length. The 
construction uses the imperfect seedless extractor to extract a small number of 
uniform random bits from the source, and will then use the resulting sequence 
as the seed for a seeded extractor to extract more random bits from the source. 
For a suitable choice of the seeded extractor, one can use this construction to 
extract almost all min-entropy of the source. 

The closedness condition needed for this result to work for a family C 
of sources is that, letting E(x, s) denote the seeded extractor with seed s, for 
every X £ C and every fixed s and y, the distribution (X\E(X, s) = y) belongs 
to C. If E is a linear function for every fixed s, the result will be available for 
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affine sources (since we are imposing a linear constraint on an affine source, it 
remains an affine source). A more precise statement of Shaltiel's main result 
is the following: 



Theorem 3.21. [132] Let C be a class of distributions on F£ and F: F£ -)■ F| 
be an extractor for C with error e. Let E: Fg xF' 2 -^ F™ be a function for 
which C satisfies the closedness condition above. Then for every X G C, 
E(X,F(X))~ e . 2t+3 E(X,U t )- □ 

Recall that a seeded extractor is called linear if it is a linear function for 
every fixed choice of the seed, and that this condition is satisfied by Trevisan's 
extractor 152 . For our construction, we will use the following theorem im- 



plied by the improvement of this extractor due to Raz, Reingold and Vadhan 



(Theorem 2.20): 



Theorem 3.22. 123 There is an explicit strong linear seeded (k, e)-extractor 
Ext: Fg xF$->Ff with d = 0(log 3 (n/e)) and m = k - 0{d). □ 

Remark 3.23. We note that our arguments would identically work for any 



other linear seeded extractor as well, for instance those constructed in 134 



150 . However, the most crucial parameter in our application is the output 
length of the extractor, being closely related to the rate of the wiretap proto- 
cols we obtain. Among the constructions we are aware of, the result quoted in 



Theorem 3.22 is the best in this regard. Moreover, an affine seeded extractor 



with better parameters is constructed by Gabizon and Raz 65 , but it requires 
a large alphabet size to work. 

Now, having the right tools in hand, we are ready to formally describe 
our construction of invertible affine extractors with nearly optimal output 
length. Broadly speaking, the construction follows the abovementioned idea 



of Shaltiel, Gabizon, and Raz 65 132] on enlarging the output length of affine 



extractors, with an additional "twist" for making the extractor invertible. For 
concreteness, the description is given over the binary field F2: 

Theorem 3.24. For every constant 5 G (0, 1] and every a G (0, 1), there is 
an explicit invertible affine extractor D : F2 — > F™ for min-entropy 5n with 

a/3 

output length m = 5n — 0{n a ) and error at most 0{2~ n ). 

Ct/3 Q 

Proof. Let e := 2~ n , and t := 0(log (n/e)) = 0(n a ) be the seed length 
required by the extractor Ext in Theorem |3.22| for input length n and error 
e, and further, let n' := n — t. Set up Ext for input length n', min-entropy 
5n — t, seed length t and error e. Also set up Bourgain's extractor AExt for 
input length n' and entropy rate 5' , for an arbitrary constant 5' < 5. Then 
the function F will view the n-bit input sequence as a tuple (s, x), s €. F? a nd 
x G F2 , and outputs Ext(x, s + AExt(a?)|r t i). This is depicted in Figure 



3.4 
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e:=2-"° /3 (a 6(0,1)) 

Figure 3.4: Construction of the invertible affine extractor. 



First we show that this is an affine extractor. Suppose that (S, X) G 
F| x FrJ is a random variable sampled from an affine distribution with min- 
entropy 5n. The variable S can have an affine dependency on X. Hence, for 
every fixed s € F|, the distribution of X conditioned on the event S = s is 
affine with min-entropy at least Sn—t, which is at least S'n' f or la rge enough n. 
Hence AExt(X) will be 2~ '"'-close to uniform by Theorem 3.9 This implies 
that AExt(X) | ft] +5 can extract t random bits from the affine source with error 
2 w. Combining this with Theorem 3.21, noticing the fact that the class of 



affine extractors is closed with respect to linear seeded extractors, we conclude 



,a/3, 



that D is an affine extractor with error at most e + 2~^ n > ■ 2* +3 = 0(2' 

Now the inverter works as follows: Given y £ F™, first it picks Z £ 
F2 uniformly at random. The seeded extractor Ext, given the seed Z is a 
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linear function Extz- ^2 ~^ F™. Without loss of generality, assume that this 
function is surjective 11 . Then the inverter picks X G F2 uniformly at random 
from the affine subspace defined by the linear constraint Extz(X) = y, and 
outputs (Z + AExt(X)|m,X). It is easy to verify that the output is indeed 
a valid preimage of y. To see the uniformity of the inverter, note that if y 
is chosen uniformly at random, the distribution of (Z, X) will be uniform on 
Fr?. Hence {Z + AExt(X)Li,X), which is the output of the inverter, will be 
uniform. □ 

In the above construction we are using an affine and a linear seeded extrac- 
tor as black boxes, and hence, they can be replaced by any other extractors as 
well (the construction will achieve an optimal rate provided that the seeded 
extractor extracts almost the entire source entropy). In particular, over large 
fields one can use the affine and seeded extractors given by Gabizon and Raz 



65 that work for sub-constant entropy rates as well. 

Moreover, for concreteness we described and instantiated our construction 
over the binary field. Observe that Shaltiel's result, for the special case of 
affine sources, holds regardless of the alphabet size. Moreover, Trevisan's lin- 
ear seeded extractor can be naturally extended to handle arbitrary alphabets. 
Hence, in order to extend our result to non-binary alphabets, it suffices to 
ensure that a suitable seedless affine extractor that supports the desired al- 
phabet size is available. Bourgain's original result |16| is stated and proved 
for the binary alphabet; however, it seems that this result can be adapted 
to work for larger fields as well |17|. Such an extension (along with some 



improvements and simplifications) is made explicit by Yehudayoff |167 



An affine extractor is in particular, a symbol-fixing extractor. Hence The- 
orem [3]24j combined with Lemma [3 .15| gives us a wiretap protocol with almost 
optimal parameters: 

Theorem 3.25. Let 5 G [0, 1) and a G (0, 1/3) be constants. Then for a 
prime power q > 1 and every large enough n there is a (5n,0(2~ n ),0) q - 
resilient wiretap protocol with block length n and rate 1 — 5 — o(l). □ 

3.7 Further Applications 

In this section we will sketch some important applications of our technique to 
more general wiretap problems. 



1 Because the seeded extractor is strong and linear, for most choices of the seed it is a 



good extractor (by Proposition 2.111, and hence necessarily surjective (if not, one of the 
output symbols would linearly depend on the others and obviously the output distribution 
would not be close to uniform). Hence if Ext is not surjective for some seed z, one can 
replace it by a trivial surjective linear mapping without affecting its extraction properties. 
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3.7.1 Noisy Channels and Active Intruders 

Suppose that Alice wants to transmit a particular sequence to Bob through a 
noisy channel. She can use various techniques from coding theory to encode 
her information and protect it against noise. Now what if there is an intruder 
who can partially observe the transmitted sequence and even manipulate it? 
Modification of the sequence by the intruder can be regarded in the same 
way as the channel noise; thus one gets security against active intrusion as 
a "bonus" by constructing a code that is resilient against noise and passive 
eavesdropping. There are two natural and modular approaches to construct 
such a code. 

A possible attempt would be to first encode the message using a good error- 
correcting code and then applying a wiretap encoder to protect the encoded 
sequence against the wiretapper. However, this will not necessarily keep the 
information protected against the channel noise, as the combination of the 
wiretap encoder and decoder does not have to be resistant to noise. 

Another attempt is to first use a wiretap encoder and then apply an error- 
correcting code on the resulting sequence. Here it is not necessarily the case 
that the information will be kept secure against intrusion anymore, as the 
wiretapper now gets to observe the bits from the channel-encoded sequence 
that may reveal information about the original sequence. However, the wire- 



tap protocol given in Theorem 3.25 is constructed from an invertible affine 
extractor, and guarantees resiliency even if the intruder is allowed to observe 
arbitrary linear combinations of the transmitted sequence (in this case, the 
distribution of the encoded sequence subject to the intruder's observation be- 



comes an affine source and thus, the arguments of the proof of Lemma 3.15 



remain valid). In particular, Theorem 3.25 holds even if the intruder's obser 



vation is allowed to be obtained after applying any arbitrary linear mapping 
on the output of the wiretap encoder. Hence, we can use the wiretap scheme 
as an outer code and still ensure privacy against an active intruder and relia- 
bility in presence of a noisy channel, provided that the error-correcting code 
being used as the inner code is linear. This immediately gives us the following 
result: 

Theorem 3.26. Suppose that there is a q-ary linear error-correcting code 
with rate r that is able to correct up to a r fraction of errors (via unique or 
list decoding). Then for every constant 5 £ [0, 1) and a G (0, 1/3) and large 
enough n, there is a (5n,0(2~ n ),0) q -resilient wiretap protocol with block 
length n and rate r — 5 — o(l) that can also correct up to a r fraction of 
errors. □ 



The setting discussed above is shown in Figure |3.5| The same idea can be 
used to protect fountain codes, e.g., LT- |101 and Raptor Codes 137 , against 



wiretappers without affecting the error correction capabilities of the code. 
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Figure 3.5: Wiretap scheme composed with channel coding. If the wiretap 
scheme is constructed by an invertible affine extractor, it can guarantee secrecy 
even in presence of arbitrary linear manipulation of the information. Active 
intrusion can be defied using an error-correcting inner code. 



Obviously this simple composition idea can be used for any type of channel 
so long as the inner code is linear, at the cost of reducing the total rate by 
almost 5. Hence, if the inner code achieves the Shannon capacity of the direct 
channel (in the absence of the wiretapper) , the composed code will achieve the 
capacity of the wiretapped channel, which is less than the original capacity 
by 5 gl]. 

3.7.2 Network Coding 

Our wiretap protocol from invertible affine extractors is also applicable in the 
more general setting of transmission over networks. A communication network 
can be modeled as a directed graph, in which nodes represent the network 
devices and information is transmitted along the edges. One particular node 
is identified as the source and m nodes are identified as receivers. The main 
problem in network coding is to have the source reliably transmit information 
to the receivers at the highest possible rate, while allowing the intermediate 
nodes arbitrarily process the information along the way. 

Suppose that, in the graph that defines the topology of the network, the 
min-cut between the source to each receiver is n. It was shown in |4| that the 
source can transmit information up to rate n (symbols per transmission) to 
all receivers (which is optimal), and in |94[|97| that linear network coding is 
in fact sufficient to achieve this rate. That is, the transmission at rate n is 
possible when the intermediate nodes are allowed to forward packets that are 
(as symbols over a finite field) linear combinations of the packets that they 
receive (See |168 for a comprehensive account of these and other relevant 
results). 

This 



A basic example is shown by the butterfly network in Figure 3.6 



network consists of a source on the top and two receivers on the bottom, 
where the min-cut to each receiver is 2. Without processing the incoming 
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Figure 3.6: Network coding (right), versus unprocessed forwarding (left). 



data, as in the left figure, one of the two receivers may receive information 
at the optimal rate of 2 symbols per transmission (namely, receiver 1 in the 
figure). However, due to the bottleneck existing in the middle (shown by the 
thick edge a — > b) , the other receiver will be forced to receive at an inferior rate 
of 1 symbol per transmission. However, if linear processing of the information 
is allowed, node a may combine its incoming information by treating packets 
as symbols over a finite field and adding them up, as in the right figure. Both 
receivers may then solve a full-rank system of linear equations to retrieve the 
original source symbols x\ and X2, and thereby achieve the optimal min-cut 
rate. 

Designing wiretap protocols for networks is an important question in net- 
work coding, which was first posed by Cai and Yeung 21 . In this problem, 



an intruder can choose a bounded number, say t, of the edges and eavesdrop 
all the packets going through those edges. They designed a network code 
that could provide the optimal multicast rate of n — t with perfect privacy. 
However this code requires an alphabet size of order ( t ) , where E is the set 
of edges. Their result was later improved in |59| who showed that a random 
linear coding scheme can provide privacy with a much smaller alphabet size 
if one is willing to achieve a slightly sub-optimal rate. Namely, they obtain 
rate n — t(\ + e) with an alphabet of size roughly 0(|£ , | 1 ' <! ), and show that 
achieving the exact optimal rate is not possible with small alphabet size. 

El Rouayheb and Soljanin [55] suggested to use the original code of Ozarow 
and Wyner [116 as an outer code at the source and showed that a careful 



choice of the network code can provide optimal rate with perfect privacy. 
However, their code eventually needs an alphabet of size at least (' t _ 1 ) + m. 
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Figure 3.7: Linear network coding with an outer layer of wiretap encoding 
added for providing secrecy. 



Building upon this work, Silva and Kschischang 95 constructed an outer 



code that provides similar results while leaving the underlying network code 
unchanged. However, their result comes at the cost of increasing the packet 
size by a multiplicative factor of at least the min-cut bound, n (or in math- 
ematical terms, the original alphabet size q of the network is enlarged to at 
least q n ). For practical purposes, this is an acceptable solution provided that 
an estimate on the min-cut size of the network is available at the wiretap 
encoder. 



By the discussion presented in Section 3.7.1 the rate-optimal wiretap pro- 



tocol given in Theorem 3.25 stays resilient even in presence of any linear 
post-processing of the encoded information. Thus, using the wiretap encoder 
given by this result as an outer-code in the source node, one can construct an 
asymptotically optimal wiretap protocol for networks that is completely un- 
aware of the network and eliminates all the restrictions in the above results. 



This is schematically shown in Figure 3.7 Hence, extending our notion of 
(£, e, 7)„-resilient wiretap protocols naturally to communication networks, we 
obtain the following: 

Theorem 3.27. Let 5 £ [0, 1) and a £ (0, 1/3) be constants, and consider 
a network that uses a linear coding scheme over a Unite held W q for reliably 
transmitting information at rate R. Suppose that, at each transmission, an 
intruder can arbitrarily observe up to 5R intermediate links in the network. 
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Then the source and the receiver nodes can use an outer code of rate 1— 5— o(l) 
(obtaining a total rate of R(l — 5) — o(l)) which is completely independent of 
the network, leaves the network code unchanged, and provides almost perfect 
privacy with error 0(2~ ) and zero leakage over a q-ary alphabet. □ 

In addition to the above result that uses the invertible affine extractor of 



Theorem 3.24, it is possible to use other rate-optimal invertiable affine extrac- 



tors. In particular, observe that the restricted affine extractor of Theorem |3.1l| 



(and in particular, Corollary 3.12 ) is a linear function (over the extension field) 
and is thus, obviously has an efficient O-inverter (since inverting the extractor 
amounts to solving a system of linear equations). By using this extractor (in- 



stantiated with Gabidulin's MRD codes as in Corollary 3.12), we may recover 
the result of Silva and Kschischang [95] in our framework. More precisely, we 
have the following result: 

Corollary 3.28. Let q be any prime power, and consider a network with 
minimum cut of size n that uses a linear coding scheme over F 9 for reliably 
transmitting information at rate R. Suppose that, at each transmission, an 
intruder can arbitrarily observe up to 5R intermediate links in the network, 
for some 5 G [0, 1). Then the source and the receiver nodes can use an outer 
code of rate 1 — 5 over F„n (obtaining a total rate of R(l — 5)) that provides 
perfect privacy over a q n -ary alphabet. □ 

3.7.3 Arbitrary Processing 

In this section we consider the erasure wiretap problem in its most general 
setting, which is still of practical importance. Suppose that the information 
emitted by the source goes through an arbitrary communication medium and 
is arbitrarily processed on the way to provide protection against noise, to 
obtain better throughput, or for other reasons. Now consider an intruder who 
is able to eavesdrop a bounded amount of information at various points of 
the channel. One can model this scenario in the same way as the original 
point-to-point wiretap channel problem, with the difference that instead of 
observing t arbitrarily chosen bits, the intruder now gets to choose an arbitrary 
Boolean circuit C with t output bits (which captures the accumulation of all the 
intermediate processing) and observes the output of the circuit when applied 
to the transmitted sequence 12 . 

Obviously there is no way to guarantee resiliency in this setting, since the 
intruder can simply choose C to compute t output bits of the wiretap decoder. 
However, suppose that in addition there is an auxiliary communication channel 
between the source and the receiver (that we call the side channel) that is 
separated from the main channel, and hence, the information passed through 
the two channel do not blend together by the intermediate processing. 



12 In fact this models a "harder" problem, as in our problem the circuit C is given by the 
communication scheme and not the intruder. Nevertheless, we consider the harder problem. 
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Figure 3.8: The wiretap channel problem in presence of arbitrary intermediate 
processing. In this example, data is transmitted over a packet network (shown 
as a cloud) in which some intermediate links (showed by the dashed arrows) 
are accessible to an intruder. 



We call this scenario the general wiretap problem, and extend our notion 
of (t, e, 7)-resilient protocol to this problem, with the slight modification that 
now the output of the encoder (and the input of the decoder) is a pair of 
strings (yi, 2/2) £ ^2 x ^2> wnere V\ (resp., 1/2) is sent through the main (resp., 
side) channel. Now we call n-\- d the block length and let the intruder choose 
an arbitrary pair of circuits (Ci ,£2), one for each channel, that output a total 
oft bits, and observe (Ci(yi), £2(2/2))- 

The information-theoretic upper bounds for the achievable rates in the 
original wiretap problem obviously extend to the general wiretap problem as 
well. Below we show that for the general problem, secure transmission is pos- 
sible at asymptotically optimal rates even if the intruder intercepts the entire 



communication passing through the side channel (as shown in Figure 3.8). 



Similar as before, our idea is to use invertible extractors to construct gen- 
eral wiretap protocols, but this time we use invertible strong seeded extractors. 
Strong seeded extractors were used in [22] to construct ERFs, and this is ex- 
actly what we use as the decoder in our protocol. As the encoder we will use 
the corresponding inverter, which outputs a pair of strings, one for the extrac- 
tor's input which is sent through the main channel and another as the seed 
which is sent through the side channel. Hence we will obtain the following 
result: 



Theorem 3.29. Let 5 E [0, 1) be a constant. Then for every a, e > 0, there 
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is a (Sn, e, 2~ an + e)-resilient wiretap protocol for the general wiretap channel 
problem that sends n bits through the main channel and d := 0(log (n/e 2 )) 
bits through the side channel and achieves rate 1 — 5 — a — 0(d/(n + d)). 
The protocol is secure even when the entire communication through the side 
channel is observable by the intruder. 

Proof. We will need the following claim in our proof, which is easy to verify 
using an averaging argument: 

Claim. Let f: {0,1}™ — > {0,1} be a Boolean function. Then for every 
a > 0, and X ~ U n , the probability that f(X) has fewer than 2< 1 ~ & - a ) 
preimages is at most 2~ an . 



Now, let Ext be the linear seeded extractor of Theorem |3.22 set up for 
input length n, seed length d = O (log (n/e 2 )), min-entropy n(l — 5 — a), and 
output length m = n(l — 5 — a) — 0(d), and error e 2 . Then the encoder chooses 
a seed Z for the extractor uniformly at random and sends it through the side 
channel. 

For the chosen value of Z, the extractor is a linear function, and as before, 
given a message x € {0, l} m , the encoder picks a random vector in the affine 
subspace that is mapped by this linear function to x and sends it through the 
public channel. 

The decoder, in turn, applies the extractor to the seed received from the 
secure channel and the transmitted string. The resiliency of the protocol 
can be shown in a similar manner as in Lemma 3.15| Specifically, note that 



by the above claim, with probability at least 1 — 2~ an , the string transmitted 
through the main channel, conditioned on the observation of the intruder from 
the main channel, has a distribution 3^ with min-entropy at least n(l — 5 — a). 
Now in addition suppose that the seed z is entirely revealed to the intruder. 
As the extractor is strong, with probability at least 1 — e, z is a good seed for 
y, meaning that the output of the extractor applied to y and seed z is e-close 



to uniform (by Proposition 2.11), and hence the view of the intruder on the 



original message remains e-close to uniform. □ 

We observe that it is not possible to guarantee zero leakage for the gen- 
eral wiretap problem above. Specifically, suppose that (C\,C2) are chosen 
in a way that they have a single preimage for a particular output (w\,W2)- 
With nonzero probability the observation of the intruder may turn out to be 
(u>i,u>2), in which case the entire message is revealed. Nevertheless, it is pos- 
sible to guarantee negligible leakage as the above theorem does. Moreover, 
when the general protocol above is used for the original wiretap II problem 
(where there is no intermediate processing involved), there is no need for a 
separate side channel and the entire encoding can be transmitted through a 



single channel. Contrary to Theorem 3.25 however, the general protocol will 



not guarantee zero leakage even for this special case. 
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3. A Some Technical Details 

This appendix is devoted to some technical details that are omitted in the 
main text of the chapter. 

The following proposition quantifies the Shannon entropy of a distribution 
that is close to uniform: 

Proposition 3.30. Let X be a probability distribution on a finite set S, 
\S\ > 4, that is e-close to the uniform distribution on S, for some e < 1/4. 
ThenH{X) > log 2 |S|(1 - e) 

Proof. Let n := \S\, and let f{x) := — aTog 2 x. The function f(x) is concave, 
passes through the origin and is strictly increasing in the range [0, 1/e]. From 
the definition, we have H{X) = ^ s( zg /(Pr^(s)). For each term s in this 
summation, the probability that X assigns to s is either at least 1/n, which 
makes the corresponding term at least log 2 n/n (due to the particular range 
of \S\ and e), or is equal to 1/n — e s , for some e s > 0, in which case the 
term corresponding to s is less than log 2 n/n by at most e s log 2 n (this follows 
by observing that the slope of the line connecting the origin to the point 
(1/n, /(1/n)) is log 2 n). The bound on the statistical distance implies that 
the differences e s add up to at most e. Hence, the Shannon entropy of X can 
be less than log 2 n by at most e log 2 n. □ 

Proposition 3.31. Let (X, Y) be a pair of random variables jointly dis- 
tributed on a finite set QxT. Then 13 E Y [d\st{X\Y, X)} = E x {d\st{Y\X, Y)]. 

Proof. For x £ fi and y G T, we will use shorthands p x ,Py,Pxy to denote 
Pr[X = x], Pr[y = y], Pr[X = x, Y = y], respectively. Then we have 



E Y [d\st(x\Y,x)] = Y,Py d ^( x \( Y = y)> x ) = \Y<PyY<\P*y/Py- 

yer yer xefi 

? 22 \ px y ~ Pxp y\ = 9 2 Px 2 \p*v/p* ~ Py\ 



2^^'" y r iyi 2 
y&r xen xen yer 

J2p*d\st(Y\(X = x),Y) =E x [d\st(Y\X,Y)\. 
xefi 



D 



Proposition 3.32. Let Q be a finite set that is partitioned into subsets 
Si,...,Sk and suppose that X is a distribution on Q that is j-close to uni- 
form. Denote by pi, i = 1, . . . k, the probability assigned to the event Si by 



13 Here we are abusing the notation and denote by Y the marginal distribution of the 
random variable Y, and by i^|(X = a) the distribution of the random variable Y conditioned 
on the event X — a. 
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X. Then 



J2Pi-d\st(X\S i ,U Si )<2 1 . 

ie[k] 



Proof. Let N := \fl\, and define for each i, 7, := J2 s es- \^ >v x(s) — j^\ , so that 
71 + ■ ■ • + 7fe < 27. Observe that by triangle's inequality, for every i we must 
have \pi — \Si\/N\ < 7$. To conclude the claim, it is enough to show that for 
every i, we have d'\st(X\Si,Usi) < li/Pi- This is shown in the following. 
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The following proposition shows that any function maps close distributions 
to close distributions: 

Proposition 3.33. Let £1 and T be finite sets and f be a function from £1 to T. 
Suppose that X and y are probability distributions on f2 and T, respectively, 
and let X' be a probability distribution on O which is S-close to X. Then if 

f(x) ~ e y, then f(x') ~ e+s y. 

Proof. Let X, X' and Y be random variables distributed according to X, X' , 
and y, respectively. We want to upperbound 

\Pr[f(X') GT]-Pr[y GT]| 

for every TCT. By the triangle inequality, this is no more than 

|Pr[/(X') G T] - Pr[/(X) G T}\ + |Pr[/(X) G T] - Pr[y G T}\ . 

Here the summand on the right hand side is upperbounded by the distance of 
f{X) and y, that is assumed to be at most e. Let T' := {x G f2 | f(x) G T}. 
Then the summand on the left can be written as 



|Pr[X' G T'] -PrLY G T'}\ 
which is at most 5 by the assumption that A' ~^ A". 



□ 
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Omitted Details of the Proof of Corollary 3.19 



Here we prove Corollary 3.19 for the case c > 1. The construction is similar to 
the case c = 1, and in particular the choice of m and k will remain the same. 
However, a subtle complication is that the expander family may not have a 



graph with d m vertices and we need to adapt the extractor of Theorem 3.18 
to support our parameters, still with exponentially small error. To do so, we 
pick a graph G in the family with N vertices, such that 

c r,m d m < ^ < c^+1^ 

for a small absolute constant 77 > that we are free to choose. The assumption 
on the expander family guarantees that such a graph exists. Let ml be the 
smallest integer such that d m > c r ' m N. Index the vertices of G by integers in 
[N]. Note that ml will be larger than m by a constant multiplicative factor 
that approaches 1 as n — > 0. 

For positive integers q and p < q, define the function Mod gjP : [q] — > [p] by 

Modg iP (x) := 1 + (x mod p). 

The extractor SFExt interprets the first ml symbols of the input as an integer u, 
< u < d m and performs a walk on G starting from the vertex Mod dm ' N (u + 
1), the walk being defined by the remaining input symbols. If the walk reaches 
a vertex v at the end, the extractor outputs Modjvd m (^) — 1, encoded as a d-ary 



string of length m. A similar argument as in Theorem 3.18 can show that with 



our choice of the parameters, the extractor has an exponentially small error, 



where the error exponent is now inferior to that of Theorem 3.18 by 0(m), 
but the constant behind O(-) can be made arbitrarily small by choosing a 
sufficiently small rj. 

The real difficulty lies with the inverter because Mod is not a balanced 
function (that is, all images do not have the same number of preimages), thus 
we will not be able to obtain a perfect inverter. Nevertheless, it is possible to 
construct an inverter with a close-to-uniform output in i^ norm. This turns 
out to be as good as having a perfect inverter, and thanks to the following 
lemma, we will still be able to use it to construct a wiretap protocol with zero 
leakage: 

Lemma 3.34. Suppose that f: [d] n — > [d] m is a (k, 2 -fi ( m ))d symbol-fixing 
extractor and that X is a distribution on [d] n such that \\X — Wuin||oo < 
2 ^ m > /d n . Denote by X' the distribution X conditioned on any fixing of at 
most n — k coordinates. Then f(X') ~ 2 -n( m ) Uwnm. 



Proof. By Proposition 3.33, it suffices to show that X' is 2 ' m ) -close to an 
(n,k)d symbol-fixing source. Let S C [d] m denote the support of X', and let 
e/d n be the £oo distance between X and Uu]n, so that by our assumption, 
e = 2 _fi ( m ). By the bound on the £00 distance, we know that Pr^-(S) is 
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between W(l — e) and jm(l + e)- Hence for any x £ S, Pr^/(x), which is 
Pr^(x)/Pr^-(S'), is between y^ • ^rf and rL • ^|. This differs from 1/|5| by 

at most 0(e) /\S\. Hence, A" is 2^( m )-close to U s . D 

In order to invert our new construction, we will need to construct an 
inverter lnv„ p for the function Mod giP . For that, given x £ [p] we will just 
sample uniformly in its preimages. This is where the non-balancedness of 
Mod causes problems, since if p does not divide q the distribution \n\i q ^{U^) 
is not uniform on [q\. 

Lemma 3.35. Suppose that q > p. Given a distribution X on [p] such that 
\\X -UtyjWoc < |, we have \\\nv qiP (X) -W[ g ]||oo < \ • |zf • 

Proof. Let X ~ X and Y ~ lnv (?j p(Af). Since we invert the modulo function 
by taking for a given output a random preimage uniformly, Pr[Y = y] is equal 
to Pr[X = Modq iP (y)] divided by the number of y with the same value for 
Mod (?> p(y). The latter number is either [q/p\ or \q/p], so 

1-6 < Pr(y = y)< 1 + £ 

p\q/p\ p[q/p\ 

Bounding the floor and ceiling functions by q/p ± 1, we obtain 

1^1 < Pr(y = y)< ^^ 
q+p q- p 

That is 

- p - eq <Pr(Y = y)- 1 -< P + eq 



q(q + p) ' q q{q - p) 

which concludes the proof since this is true for all y. □ 

Now we describe the inverter lnv(x) for the extractor, again abusing the 
notation. First the inverter calls lnvjv,d m (x) to obtain x\ S [N]. Then it 
performs a random walk on the graph, starting from x±, to reach a vertex X2 
at the end which is inverted to obtain X3 = lnv dm / N (x2) as a d-ary string of 
length m' . Finally, the inverter outputs y = (x^,w), where w corresponds the 
inverse of the random walk of length n — m'. It is obvious that this procedure 
yields a valid preimage of x. 

Using the previous lemma, if x is chosen uniformly, x\ will be at 1^- 

distance 

\ d m 1 

ei - N N-d m N l J ' 

For a given walk, the distribution of X2 will just be a permutation of the 

distribution of x\ and applying the lemma again, we see that the ^-distance 

of X3 from the uniform distribution is 

e 2 = J- • N + €ldm ' = ^0(c-^) 
d m ' d m '-N d m ' v '' 
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This is true for all the d n m possible walks so the l^ 
bution of y from uniform is bounded by -^0(c~ r]m 



distance of the distri 
Applying Lemma 



in an argument similar to Lemma 3.15 concludes the proof. 
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Domenico Scarlatti (1685-1757): Keyboard Sonata 
in B minor K. 87 (L. 33). 



"War does not determine who is 
right — only who is left. " 

— Bertrand Russell 



Chapter 4 

Group Testing 



The history of group testing is believed to date back to the second World 
War. During the war, millions of blood samples taken from draftees had to 
be subjected to a certain test, and be analyzed in order to identify a few 
thousand cases of syphilis. The tests were identical for all the samples. Here 
the idea of group testing came to a statistician called Robert Dorfman (and 
perhaps, a few other researchers working together with him, among them 
David Rosenblatt). He made a very intuitive observation, that, the samples 
are constantly subjected to the same test, which is extremely sensitive and 
remains reliable even if the sample is diluted. Therefore, it makes sense to, 
instead of analyzing each sample individually, pool every few samples in a 
group, and apply the test on the mixture of the samples. If the test outcome 
is negative, we will be sure that none of the samples participating in the pool 
are positive. On the other hand, if the outcome is positive, we know that 
one or more of the samples are positive, and will have to proceed with more 
refined, or individual, tests in order to identify the individual positives within 
the group. 

Since the number of positives in the entire population was suspected to be 
in order of a few thousands — a small fraction of the population — Dorfman's 
idea would save a great deal of time and resources. Whether or not the idea 
had been eventually implemented at the time, Dorfman went on to publish 
a paper on the topic |49] , which triggered an extensive line of research in 
combinatorics known today as combinatorial group testing. 

The main challenge in group testing is to design the pools in such a way 
to minimize the number of tests required in order to identify the exact set of 
positives. Larger groups would save a lot of tests if their outcome is nega- 
tive, and are rather wasteful otherwise (since in the latter case they convey a 
relatively small amount of information) . 

Of course the applications of group testing are not limited to blood sam- 
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pling. To mention another early example, consider a production line of electric 
items such as light bulbs (or resistors, capacitors, etc). As a part of the quality 
assurance, defective items have to be identified and discarded. Group testing 
can be used to aid this process. Suppose that a group of light bulbs are con- 
nected in series, and an electric current is passed through the circuit. If all the 
bulbs are illuminated, we can be sure than none is defective, and otherwise, 
we know that at least one is defective. 

Since its emergence decades ago, group testing has found a large number 
of surprising applications that are too numerous to be extensively treated 
here. We particularly refer to applications in molecular biology and DNA 



library screening (cf. [18 58 102 113 130 163 164 and the references therein), 



multiaccess communication 162 , data compression 81 , pattern matching 



37 , streaming algorithms 38 , software testing 14 , compressed sensing 139], 



and secure key distribution |26|, among others. Moreover, entire books are 



specifically targeted to combinatorial group testing [50,51 . 

In formal terms, the classical group testing problem can be described as 
follows. Suppose that we wish to "learn" a Boolean vector of length n, namely 
x = (xi, . . . ,x n ) £ {0, 1}" using as few questions as possible. Each question 
can ask for a single bit Xj, or more generally, specify a group of coordinates 
X C [n] (I t^ 0) and ask for the bit-wise "or" of the entries at the specified 
coordinates; i.e., \f ie x x i- We will refer to this type of questions as disjunc- 
tive queries. Obviously, in order to be able to uniquely identify x, there is in 
general no better way than asking for individual bits x\, . . . , x n (and thus, n 
questions), since the number of Boolean vectors of length n is 2 n and thus, 
information theoretically, n bits of information is required to describe an ar- 
bitrary n-bit vector. Therefore, without imposing further restrictions on the 
possible realizations of the unknown vector, the problem becomes trivial. 

Motivated by the blood sampling application that we just described, natu- 
ral restriction that is always assumed in group testing on the unknown vector 
x is that it is sparse. Namely, for an integer parameter d > 0, we will assume 
that the number of nonzero entries of x is at most d. We will refer to such a 
vector as ci-sparse. The number of d-sparse Boolean vectors is 

n ^ 2 ( dl °gW d )) 



£ 



i=o x 

and therefore, in principle, any d-sparse Boolean vector can be described using 
only 0(dlog(n/d)) bits of information, a number that can be substantially 
smaller than n if d <C re. The precise interpretation of the assumption "d <C n" 
varies from a setting to another. For a substantial part of this chapter, one 
can think of d = 0(y/n). The important question in group testing that we 
will address in this chapter is that, whether the information-theoretic limit 
Q(d\og(n/d)) on the number of questions can be achieved using disjunctive 
queries as well. 
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Notation for this chapter: In this chapter we will be constantly work- 
ing with Boolean vectors and their support. The support of a vector x = 
(x\, . . . ,x n ) £ {0,1}™, denoted by supp(x), is a subset of [n] such that i G 
supp(x) if and only if x% = 1. Thus the Hamming weight of x, that we will 
denote by wgt(x) can be defined as wgt(x) = |supp(x)|, and a d-sparse vector 
has the property that wgt(x) < d. 

For a matrix M, we denote by M[i, j] the entry of M at the ith row and jth 
column. Moreover, we denote the ith entry of a vector x by x(i) (assuming a 
one-to-one correspondence between the coordinate positions of x and natural 
numbers). For an m x n Boolean matrix M and S C [n], we denote by M\$ 
the m x \S\ submatrix of M formed by restricting M to the columns picked 
by S. 

For non- negative integers eo and ei, we say that an ordered pair of binary 
vectors (x, y), each in {0, l} ra , are (eo, ei)-close (or x is (eo, ei)-close to y) if y 
can be obtained from x by flipping at most eo bits from to 1 and at most 
ei bits from 1 to 0. Hence, such x and y will be (eo + ei)-close in Hamming- 
distance. Further, (x,y) are called (eo,ei)-far if they are not (eo, ei)-close. 
Note that if x and y are seen as characteristic vectors of subsets X and Y 
of [n], respectively, they are (\Y \ X\, \X \ F|)-close. Furthermore, (x,y) are 
(eo; ei)-close if and only if (y, x) are (ei, eo)-close. 

4.1 Measurement Designs and Disjunct Matrices 

Suppose that we wish to correctly identify a <i-sparse vector x £ {0, l} n using 
a reasonable amount of disjunctive queries (that we will simply refer to as 
"measurements"). In order to do so, consider first the following simple scheme: 

1. If n < 2d, trivially measure the vector by querying x±, . . . , x n individu- 
ally. 

2. Otherwise, partition the coordinates of x into |_2<iJ blocks of length either 
[n/(2d)\ or \n/(2d)~\ each, and query the bitwise "or" of the positions 
within each block. 

3. At least half of the measurement outcomes must be negative, since the 
vector x is ci-sparse. Recursively run the measurements over the union 
of those blocks that have returned positive. 

In the above procedure, each recursive call reduces the length of the vec- 
tor to half or less, which implies that the depth of the recursion is log(n/2d). 
Moreover, since 2d measurements are made at each level, altogether we will 
have 0{d\og{n/d)) measurements. Therefore, the simple scheme above is op- 
timal in the sense that it attains the information-theoretic limit Q(dlog(n/d)) 
on the number of measurements, up to constant factors. 
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The main problem with this scheme is that, the measurements are adaptive 
in nature. That is, the choice of the coordinate positions defining each mea- 
surement may depend on the outcomes of the previous measurements. How- 
ever, the scheme can be seen as having 0(log(n/d)) adaptive stages. Namely, 
each level of the recursion consists of 2d queries whose choices depend on the 
query outcomes of the previous levels, but otherwise do not depend on the 
outcomes of one another and can be asked in parallel. 

Besides being of theoretical interest, for certain application such as those 
in molecular biology, adaptive schemes can be infeasible or too costly, and the 
"amortized" cost per test can be substantially lowered when all queries are 
specified and fixed before any measurements are performed. Thus, a basic 
goal would be to design a measurement scheme that is fully non-adaptive so 
that all measurements can be performed in parallel. The trivial scheme, of 
course, is an example of a non-adaptive scheme that achieves n measurements. 
The question is that, how close can one get to the information-theoretic limit 
Q(log(n/d)) using a fully non-adaptive scheme? In order to answer this ques- 
tion, we must study the combinatorial structure of non-adaptive group testing 
schemes. 

Non-adaptive measurements can be conveniently thought of in a matrix 
form, known as the measurement matrix, that is simply the incidence matrix 
of the set of queries. Each query can be represented by a Boolean row vector 
of length n that is the characteristic vector of the set of indices that partic- 
ipate in the query. In particular, for a query that takes a subset X C [n] of 
the coordinate positions, the corresponding vector representation would the 
Boolean vector of length n that is supported on the positions picked by X. 
Then the measurement matrix is obtained by arranging the vector encodings 
on the individual queries as its rows. In particular, the measurement matrix 
corresponding to a set of m non-adaptive queries will be the m x n Boolean 
matrix that has a 1 at each position (i,j) if and only if the jth coordinate 
participates in the iih query. Under this notation, the measurement outcomes 
corresponding to a Boolean vector x £ {0, l} n and an m x n measurement 
matrix M is nothing but the Boolean vector of length m that is equal to the 
bit-wise "or" of those columns of M picked by the support of x. We will 
denote the vector of measurement outcomes by M[x]. For example, for the 
measurement matrix 

/0 1 1 1 1 0\ 

10 10 10 1 

M := 01010100 

10 11 

\1 1 1 1 1 0/ 

and Boolean vector x := (1,1,0,1,0,0,0,0), we have M[x] = (1,1,1,0,1) 
which is the bit-wise "or" of the columns shown in boldface. 
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Now suppose that the measurement matrix M is chosen so that it can be 
used to distinguish between any two d-sparse vectors. In particular, for every 
set S C [n] of indices such that |5| < d — 1, d being the sparsity parameter, 
the (d — l)-sparse vector x G {0, l} n supported on S must be distinguishable 
from the d-sparse vector x' G {0, l} n supported on S U {i}, for any arbitrary 
index i G [n] \ S. Now observe that the Boolean function "or" is monotone. 
Namely, for a Boolean vector (a\, . . . , a n ) G {0, l} n that is monotonically less 
than or equal to another vector (&i, . . . ,b n ) G {0, l} n (i.e., for every j G [n], 
a j < &i)j it must be that 

v % < V 6 i- 

je[n] je[n] 

Therefore, since we have chosen x and x' so that supp(x) C supp(x'), we must 
have supp(Af [x]) C supp(M[a/]). Since by assumption, M[x] and M[x'] must 
differ in at least one position, at least one of the rows of M must have an 
entry 1 at the ith row but all zeros at those corresponding to the set S. This 
is the idea behind the classical notion of disjunct matrices, formally defined 
below (in a slightly generalized form). 

Definition 4.1. For integer parameters d, e > (respectively called the spar- 
sity parameter and noise tolerance), a Boolean matrix is (d, e)-disjunct if for 
every choice of d + 1 distinct columns Co, C\, . . . , Cd of the matrix we have 

|supp(C ) \uf =1 supp(Ci)| > e. 

A (d, 0)-disjunct matrix is simply called d-disjunct. 

In the discussion preceding the above definition we saw that the notion of 
(d — l)-disjunct matrices is necessary for non-adaptive group testing, in that 
any non-adaptive measurement scheme must correspond to a (d — l)-disjunct 
matrix. It turns out that this notion is also sufficient, and thus precisely 
captures the combinatorial structure needed for non-adaptive group testing. 

Theorem 4.2. Suppose that M is an m x n matrix that is (d,e) -disjunct. 
Then for every pair of distinct d-sparse vectors x,x' G {0,1}™ such that 
supp(x) ^ supp(x'), we have 

(4.1) |supp(M[x]) \ supp(M[x'])| > e. 

Conversely, if M is such that ( |4.1[ ) holds for every choice of x,x' as above, 
then it must be (d — 1, e)-disjunct. 

Proof. For the forward direction, let S := supp(x') and i G supp(x) \supp(a/). 



Then Definition 4.1 implies that there is a set E C [m] of rows of M such 
that \E\ > e and for every j G E, we have M[i,j] = 1 and the jth row of 
M restricted to the columns in S (i.e., the support of x') entirely consists of 
zeros. Thus, the measurement outcomes for x' at positions in E must be zeros 
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while those measurements have a positive outcome for x (since they include at 



least one coordinate, namely i, on the support of x). Therefore, (4.1) holds. 

For the converse, consider any set S C [n] of size at most d — 1 and 
i £ [n]\ S. Consider <i-sparse vectors x, x' £ {0, l} n such that supp(x') := S 
and supp(x) := S U {i}. By assumption, there must be a set E C [m] of 
size larger than e such that, for every j G E, we have M[a;](j) = 1 but 
M[x'](j) = 0. This implies that on those rows of M that are picked by E, 
the ith entry must be one while those corresponding to S must be zeros. 
Therefore, M is (d, e)-disjunct. □ 

From the above theorem we know that the measurement outcomes cor- 
responding to distinct d-sparse vectors differ from one another in more than 
e positions provided that the measurement matrix is (d, e)-disjunct. When 
e > 0, this would allow for distinguishability of sparse vectors even in pres- 
ence of noise. Namely, even if up to [e/2\ of the measurements are allowed 
to be incorrect, it would still possible to uniquely reconstruct the vector be- 
ing measured. For this reason, we have called the parameter e the "noise 
tolerance" . 

4.1.1 Reconstruction 

So far we have focused on combinatorial distinguishability of sparse vectors. 
However, for applications unique distinguishability is by itself not sufficient 
and it is important to have efficient "decoding" algorithms to reconstruct the 
vector being measured. 

Fortunately, monotonicity of the "or" function substantially simplifies the 
decoding problem. In particular, if two Boolean vectors x, x' such that the 
support of x is not entirely contained in that of x' are distinguishable by 
a measurement matrix, adding new elements to the support of x will never 
make it "less disginguishable" from x' . Moreover, observe that the proof of 



Theorem 4.2 never uses sparsity of the vector x. Therefore we see that, (d, e)- 
disjunct matrices are not only able to distinguish between d-sparse vectors, but 
moreover, the only Boolean vector (be it sparse or not) that may reproduce 
the measurement outcomes resulting from a <i-sparse vector x £ {0, l} n is x 
itself. Thus, given a vector of measurement outcomes, in order to reconstruct 
the sparse vector being measured it suffices to produce any vector that is 
consistent with the measurement outcomes. This observation leads us to the 
following simple decoding algorithm, that we will call the distance decoder: 

1. Given a measurement outcome y £ {0, l}" 1 , identify the set Sy C [n] of 
the column indices of the measurement matrix M such that each i G [n] 
is in S y if and only if the ith column of M, denoted by q, satisfies 

|supp(ci)\supp(y)| < [e/2j. 
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2. The reconstruction outcome x £ {0, l} ra is the Boolean vector supported 
on Sy. 

Lemma 4.3. Let x G {0, l} n be d-sparse and y := M[x], where the mea- 
surement matrix M is (d,e) -disjunct. Suppose that a measurement outcome 
y that has Hamming distance at most [e/2\ with y is given to the distance 
decoder. Then the outcome x of the decoder is equal to x. 

Proof. Since the distance decoder allows for a "mismatch" of size up to e for 
the columns picked by the set Sy, we surely know that supp(x) C S y = supp(x). 
Now suppose that there is an index i 6 [n] such that i & S y but i £ supp(x). 
Since M is (d, e)-disjunct, we know that for the ith column a we have 

|supp(q)\supp(y)| > e. 

On the other hand, since i 6 Sy, it must be that 

|supp(ci)\supp(y)| < [e/2\, 
and moreover, by assumption we have that 

|supp(y)\supp(y)| < [e/2j. 
This is a contradiction. Therefore we must have S y C supp(x), implying that 

x = x. n 

4.1.2 Bounds on Disjunct Matrices 

So far we have seen that the notion of disjunct matrices is all we need for 
non-adaptive group testing. But how small can the number of rows of such 
matrices be? Equivalently, what is the smallest number of measurements 
required by a non-adaptive group testing scheme that can correctly identify 
the support of <i-sparse vectors? 

4.1.2.1 Upper and Lower Bounds 

In the following, we use the probabilistic method to show that, a randomly 
constructed matrix is with overwhelming probability disjunct, and thus obtain 
an upperbound on the number of the rows of disjunct matrices. 

Theorem 4.4. Let p € [0, 1) be an arbitrary real parameter, and d, n be 
integer parameters such that d < n. Consider a random m x n Boolean 
matrix M such that each entry of M is, independently, chosen to be 1 with 
probability q := 1/d. Then there is an mo = 0(d 2 \og{n/d)/{\ — p) 2 ) and 
e = VL(pm/d) such that M is (d, e)-disjunct with probability 1 — o(l) provided 
that m > m,Q. 
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Proof. Consider any set S of d columns of M, and any column outside those, 
say the ith column where i ^ S. First we upper bound the probability of a 
failure for this choice of S and i, i.e., the probability that the number of the 
positions at the ith column corresponding to which all the columns in S have 
zeros is at most e. Clearly if this event happens the (d, e)-disjunct property of 
M would be violated. On the other hand, if for no choice of S and i a failure 
happens the matrix would be indeed (d, e)-disjunct. 

Now we compute the failure probability pj for a fixed S and i. A row 
is good if at that row the ith column has a 1 but all the columns in S have 
zeros. For a particular row, the probability that the row is good is q{\ — q) d . 
Then failure corresponds to the event that the number of good rows is at 
most e. The distribution of the number of good rows is binomial with mean 
fi = q(l — q) d m. Choose e := pmq(l — q) d = Q(pm/d). By a Chernoff bound, 
the failure probability is at most 

p f < exp(-0 - e) 2 /(2/x)) 
< exp(— mq(l — p) /6) 

where the second inequality is due to the fact that (1 — q) d = (1 — l/d) d is 
always between 1/3 and 1/2. 

Now if we apply a union bound over all possible choices of S and i, the 
probability of coming up with a bad choice of M would be at most 



nl J exp(— mq(l — p) /6). 

This probability vanishes so long as m > mo for some tuq = 0(d 2 log(n/d)/(l — 

P?)- □ 

The above result shows, in particular, that d-disjunct matrices with n 
columns and 0(d 2 log(n/d)) rows exist. This is by off from the information- 
theoretic barrier 0(dlog(n/d)) by a multiplicative factor 0(d), which raises 
the question, whether better disjunct matrices can be found. In the literature 
of group testing, combinatorial lower bounds on the number of rows of disjunct 
matrices exist, which show that the above upper bound is almost the best one 
can hope for. In particular, D'yachkov and Rykov |54| have shown that the 
number of rows of any d-disjunct matrices has to be Q(d 2 log d n). Several 
other concrete lower bounds on the size of disjunct matrices is known, which 
are all asymptotically equivalent (e.g., 63,129]). Moreover, for a nonzero noise 



tolerance e, the lower bounds can be extended to £l(d 2 log d n + ed). 
4.1.2.2 The Fixed-Input Case 



The probabilistic construction of disjunct matrices presented in Theorem 4.4 
almost surely produces a disjunct matrices using 0{d 2 \og{n/d)) measure- 
ments. Obviously, due to almost-matching lower bounds, by lowering the 
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number of the measurement the disjunctness property cannot be assured any- 
more. However, the randomized nature of the designs can be used to our 
benefit to show that, using merely O(filogn) measurements (almost match- 
ing the information-theoretic lower bound) it is possible (with overwhelming 
probability) to distinguish a "fixed" d-sparse vector from any other (not nec- 
essarily sparse) vector. More precisely we have the following result, whose 
proof is quite similar to that of Theorem |4.4| 

Theorem 4.5. Let p G [0, 1) be an arbitrary real parameter, d,n be integer 
parameters such that d < n, and x E {0, l} n be a fixed d-sparse vector. 
Consider a random m x n Boolean matrix M such that each entry of M 
is, independently, chosen to be 1 with probability q := 1/d. Then there is an 
m o = 0(d(logn)/(l—p) 2 ) and e = Q(pm/d) such that, provided that rn > mo, 
with probability 1 — o(l) the following holds: For every y £ {0, l} n , y / x, 
the Hamming distance between the outcomes M[y] and M[x] is greater than 
e. 



Proof. We follow essentially the same argument as the proof of Theorem |4.4[ 
but will need a weaker union bound at the end. Call a column i of M good if 
there are more than e rows of M at which the ith column has a 1 but those 
on the support of x (excluding the ith column) have zeros. Now we can follow 
the argument in the proof of Theorem |4.4| to show that under the conditions 
of the theorem, with probability 1 — o(l), all columns of M are good (the only 
difference is that, the last union bound will enumerate a set of n possibilities 
rather than (n — 1)Q))- 

Now suppose that for the particular outcome of M all columns are good, 
and take any y £ {0, l} n , y ^ x. One of the following cases must be true, 
and in either case, we show that M[x] and M[y] are different at more than e 
positions: 

1. There is an i £ supp(y) \ supp(x): Since the ith column is good, we 
know that for more than e rows of M, the entry at the ith column is 
1 while those at supp(x) are all zeros. This implies that at positions 
corresponding to such rows, M[y] must be 1 but M[x] must be zero. 

2. We have supp(y) C supp(x): In this case, take any i G supp(x) \supp(y), 
and again use the fact that the ith column is good to conclude that at 
more than e positions the outcome M[y] must be zero but M[x] must 
be 1. 

□ 

As a corollary, the above theorem shows that, with overwhelming proba- 
bility, once we fix the outcome of the random matrix M constructed by the 
theorem, the matrix M will be able to distinguish between most ci-sparse vec- 
tors even in presence of any up to [e/2\ incorrect measurement outcomes. 
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In particular, we get an average-case result, that there is a fixed measure- 
ment scheme with only 0(d log n) measurements using which it is possible to 
uniquely reconstruct a randomly chosen d-sparse vector (e.g., under the uni- 
form distribution) with overwhelming probability over the distribution from 
which the sparse vector is drawn. 

4.1.2.3 Sparsity of the Measurements 



The probabilistic construction of Theorem 4.4 results in a rather sparse matrix, 
namely, one with density q = 1/d that decays with the sparsity parameter d. 
Below we show that sparsity is a necessary condition for the probabilistic 
construction to work at an optimal level on the number of measurements: 

Lemma 4.6. Let M be an to, x n Boolean random matrix, where m = 
0(d 2 logn) for an integer d > 0, which is constructed by setting each en- 
try independently to 1 with probability q. Then either q = 0(logd/d) or 
otherwise the probability that M is (d, e)-disjunct (for any e > 0) approaches 
to zero as n grows. 

Proof. Suppose that M is an mxn matrix that is (d, e)-disjunct. Observe that, 
for any integer t E (0, d), if we remove any t columns of M and all the rows on 
the support of those columns, the matrix must remain (d — t, e)-disjunct. This 
is because any counterexample for the modified matrix being (d — t, e)-disjunct 
can be extended to a counterexample for M being (d, e)-disjunct by adding 
the removed columns to its support. 

Now consider any t columns of M, and denote by too the number of rows 
of M at which the entries corresponding to the chosen columns are all zeros. 
The expected value of mo is (1 — q) m. Moreover, for any constant 5 > we 
have 

(4.2) Pr[m > (1 + <5)(1 - qfm] < exp(-5 2 (l - qfm/i) 

by a Chernoff bound. 

Let to be the largest integer for which 

(1 + S)(1 -q) to m > logn. 

If to < d — 1, we let t := 1 + to above, and this makes the right hand side 



of (4.2) upper bounded by o(l). So with probability 1 — o(l), the chosen t 
columns of M will keep too at most (1 + 5)(1 — q^m, and removing those 
columns and too rows on their union leaves the matrix (d — to — 1, e)-disjunct, 
which obviously requires at least logn rows (as even a (l,0)-disjunct matrix 
needs so many rows). Therefore, we must have 

(1 + 5)(l- qfrn > logn 
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or otherwise (with overwhelming probability) M will not be {d, e)-disjunct. 
But the latter inequality is not satisfied by the assumption on to- So if to < 
d — 1, little chance remains for M to be (d, e)-disjunct. 

Now consider the case to > d — 1. Thus, by the choice of to, we must have 

(l + 5)(l-q) d ~ l m> logn. 

The above inequality implies that we must have 

log(m(l + 5)/ logn) 
q ~ d^l ' 

which, for m = 0(d 2 logn) gives q = 0(\ogd/d). D 

4.2 Noise resilient schemes and approximate 
reconstruction 

So far, we have introduced the notion of (d, e)-disjunct matrices that can be 
used in non-adaptive group testing schemes to identify d-sparse vectors up to 
a number of measurement errors depending on the parameter e. However, as 
the existing lower bounds suggest, the number of rows of such matrices cannot 
reach to the information-theoretic optimum 0{d\og{n/d)) and moreover, the 
noise tolerance e can be at most a factor 1/d of the number of measurements. 
This motivates two natural questions: 

1. Can the number of measurements be lowered at the cost of causing 
a slight amount of "confusion"? We know, by Theorem |4.5| that, it 
is possible to identify sparse vectors on average using only 0(dlogn) 
measurements. But can something be said in the worst case ? 

2. What can be said if the amount of possible errors can be substantially 
high; e.g., when a constant fraction of the measurements can produce 
false outcomes? 

In order to answer the above questions, in this section we introduce a no- 
tion of measurement schemes that can be "more flexible" than that of disjunct 
matrices, and aims to study the trade-off between the amount of errors ex- 
pected on the measurements versus the ambiguity of the reconstruction. More 
formally we define the following notion. 

Definition 4.7. Let m, n, d, eo, e±, e , e'i be integers. An m x n measurement 
matrix A is called (eo, ei, e^e^) -resilient for d-sparse vectors if, for every y G 
{0, l} m there exists z G {0, 1}™ (called a valid decoding of y) such that for 
every x G {0, l} n , whenever (x,z) are (e , e'jj-far, (-A[a;],y) are (eo, ei)-far 1 . 

1 In particular this means that for every x, x' G {0, 1}™, if (j4[x], A[a;']) are (eo, ei)-close, 
then x and x 1 must be (e + ei, e + e'i)-close. 
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The matrix A is called explicit if it can be computed in polynomial time 
in its size, and fully explicit if each entry of the matrix can be computed in 
time poly(m,logn). 

Intuitively, the definition states that two measurements are allowed to be 
confused only if they are produced from close vectors. The parameters eo 
and e correspond to amount of tolerable false positives on the measurement 
outcomes and reconstructed vector, where by false positive we mean an error 
caused by mistaking a for 1. Similarly, e\ and e' x define the amount of 
tolerable false negatives on both sides, where a false negative occurs when a 
bit that actually must be 1 is flipped to 0. 

In particular, an (eo, ei, e' , e^-resilient matrix gives a group testing scheme 
that reconstructs the sparse vector up to e false positives and e[ false neg- 
atives even in the presence of eo false positives and e\ false negatives in the 
measurement outcome. Under this notation, unique (exact) decoding would 
be possible using an (eo, ei,0, 0)-resilient matrix if the amount of measure- 
ment errors is bounded by at most eo false positives and e\ false negatives. 
However, when e + e\ is positive, decoding may require a bounded amount 
of ambiguity, namely, up to e' false positives and e' t false negatives in the 
decoded sequence. 

Observe that the special case of (0, 0, 0, 0)-resilient matrices corresponds 
to the classical notion of d-disjunct matrices, while a (d, e)-disjunct matrix 
would give a (|_e/2j, [e/2\ , 0, 0)-resilient matrix for d-sparse vectors. 

Definition |4.7| is in fact reminiscent of list- decoding in error-correcting 
codes, but with the stronger requirement that the list of decoding possibilities 
must consist of vectors that are close to one another. 

4.2.1 Negative Results 

In coding theory, it is possible to construct codes that can tolerate up to a 
constant fraction of adversarially chosen errors and still guarantee unique de- 
coding. Hence it is natural to wonder whether a similar possibility exists in 
group testing, namely, whether there is a measurement matrix that is robust 
against a constant fraction of adversarial errors and still recovers the mea- 
sured vector exactly. We already have mentioned that this is in general not 
possible, since any (d, e)-disjunct matrix (a notion that is necessary for this 
task) requires at least de rows, and thus the fraction of tolerable errors by 
disjunct matrices cannot be above 1/d. Below we extend this result to the 
more "asymmetric" notion of resilient matrices, and show that the fraction of 
tolerable false positives and false negatives must be both below 1/d. 

Lemma 4.8. Suppose that anmxn measurement matrix M is (eo, e-i, e , e^)- 
resilient for d-sparse vectors. Then (maxjeo, ei} + l)/(e + e[ + 1) < m/d. 
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Proof. We use similar arguments as those used in [20 , 75 in the context of 
black-box hardness amplification in NP: Define a partial ordering -< between 
binary vectors using bit- wise comparisons (with < 1). Let t := d/(e' + e' 1 + l) 
be an integer 2 , and consider any monotonically increasing sequence of vectors 
xq ■<■■■■< xt in {0, l} n where X{ has weight i(e' + e' x + 1). Thus, xq and 
Xt will have weights zero and d, respectively. Note that we must also have 
M[xq] -<■■■-< M[xt] due to monotonicity of the "or" function. 



A fact that is directly deduced from Definition 4.7 is that, for every x,x' € 
{0, l} n , if (M[x], M[x']) are (eo, ei)-close, then x and x' must be (e' H-e^, e' + 
e^-close. This can be seen by setting y := M[x'] in the definition, for which 
there exists a valid decoding z £ {0, l} ra . As (M[x],y) are (eo, ei)-close, the 
definition implies that (x,z) must be (e , e'^-close. Moreover, (M[x'],y) are 
(0,0)-close and thus, (eo, ei)-close, which implies that (z,x') must be (e^, en- 
close. Thus by the triangle inequality, (x, x') must be (e + e' l5 e' + e'^-close. 

Now, observe that for all i, (xi, Xi+i) are (e' + e' l5 e + e^-far, and hence, 
their encodings must be (eo, ei)-far, by the fact we just mentioned. In partic- 
ular this implies that M[xt] must have weight at least t(eo + l), which must be 
trivially upper bounded by m. Hence it follows that (eo + l)/(e' + e[ + 1) < 
m/d. Similarly we can also show that (ei + l)/(e' + e[ + 1) < m/d. □ 

As shown by the lemma above, tolerance of a measurement matrix against 
a constant fraction of errors would make an ambiguity of order Q(d) in the 
decoding inevitable, irrespective of the number of measurements. For most 
applications this might be an unsatisfactory situation, as even a close estimate 
of the set of positives might not reveal whether any particular individual is 
defective or not, and in certain scenarios (such as an epidemic disease or in- 
dustrial quality assurance) it is unacceptable to miss any defective individuals. 
This motivates us to focus on approximate reconstructions with one-sided er- 
ror. Namely, we will require the support of the reconstruction x to always 
contain the support of the original vector x being measured, and be possibly 
larger by up to 0(d) positions. It can be argued that, for most applications, 
such a scheme is as good as exact reconstruction, as it allows one to signif- 
icantly narrow-down the set of defectives to up to 0(d) candidate positives. 



In particular, as observed in 93 , one can use a second stage if necessary and 
individually test the resulting set of candidates, using more reliable measure- 
ments, to identify the exact set of positives. In the literature, such schemes 
are known as trivial two-stage schemes. 

The trade-off given by the following lemma only focuses on false negatives 
and is thus useful for trivial two-stage schemes: 

2 For the sake of simplicity in this presentation we ignore the fact that certain fractions 
might in general give non-integer values. However, it should be clear that this will cause no 
loss of generality. 
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Lemma 4.9. Suppose that anmxn measurement matrix M is (eo, ei, e , e[)- 
resilient for d-sparse vectors. Then for every e > 0, either 

(ei + l)m 

ei < — , 
ed 



oi - 



(l_ e )( n _ d + l) 

e > " 



(ei + I) 2 ' 

Proof. Let x £ {0, l} n be chosen uniformly at random among vectors of weight 
d. Randomly flip e[ + 1 of the bits on the support of x to 0, and denote the 
resulting vector by x' . Using the partial ordering -< in the proof of the last 
lemma, it is obvious that x' -< x, and hence, M[x'] -< M[x]. Let b denote any 
disjunction of a number of coordinates in x and b' the same disjunction in x' . 
We must have 

Pr[6' = 016 = 1] < e l±l : 
d 

as for b to be 1 at least one of the variables on the support of x must be 

present in the disjunction and one particular such variable must necessarily 

be flipped to bring the value of b' down to zero. Using this, the expected 

Hamming distance between M[x] and M[x'] can be bounded as follows: 

E[dist(M[x], M[x'])} = ^ t(M[x]i = 1 A M[x']i = 0) < ^jt_ . m> 

where the expectation is over the randomness of x and the bit flips, dist(-, •) 
denotes the Hamming distance between two vectors, and l(-) denotes an in- 
dicator predicate. 

Fix a particular choice of a;' that keeps the expectation at most (e'i + l)m/d. 
Now the randomness is over the possibilities of x, that is, flipping up to ei + 1 
zero coordinates of x' randomly. Denote by X the set of possibilities of x 
for which M[x] and M[x'] are ed -close, and by S the set of all vectors 
that are monotonically larger than x' and are (ei + l)-close to it. Obviously, 
X C S, and, by Markov's inequality, we know that \X\ > (1 — e)|<S|. 

Let z be any valid decoding of M[x'], Thus, (x' , z) must be (e' ,ei)-close. 
Now assume that e± > ed and consider any x € X. Hence, (M[x], M[x']) 



are (eo, ei)-close and (x,z) must be (e , ei)-close by Definition 4.7 Regard 
x,x',z as the characteristic vectors of sets X, X' , Z C [n], respectively, where 
X' C X. We know that \X\Z\< e[ and \X \ X'\ = e[ + 1. Therefore, 

(4.3) \(X \ x') nz\ = \x\ X'\ -\X\Z\ + \X' \ Z\ > 0, 

and z must take at least one nonzero coordinate from supp(x) \ supp(x'). 

Now we construct an (ei + l)-hypergraph 3 H as follows: The vertex set is 
[n] \ supp(x'), and for every x £ X, we put a hyperedge containing supp(x) \ 



3 See Appendix 4. A for definitions. 
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supp(x'). The density of this hypergraph is at least 1 — e, by the fact that 



\X\ > (1 — e)S. Now Lemma 4.34 implies that H has a matching of size at 

least 

= (l- e )(n-d+l) 

K + I) 2 ' 



As by (4.3), supp(z) must contain at least one element from the vertices in 



each hyperedge of this matching, we conclude that |supp(z) \ supp(x')| > t, 
and that e' >t. □ 

The lemma above shows that if one is willing to keep the number e'-y of false 
negatives in the reconstruction at the zero level (or bounded by a constant), 
only an up to 0(l/d) fraction of false negatives in the measurements can be 
tolerated (regardless of the number of measurements), unless the number e' Q 
of false positives in the reconstruction grows to an enormous amount (namely, 
Q,(n) when n — d = £l(n)) which is certainly undesirable. 

Recall that exact reconstruction of d-sparse vectors of length n, even in a 
noise-free setting, requires at least Q(d 2 log d n) non-adaptive measurements. 
However, it turns out that there is no such restriction when an approximate 
reconstruction is sought for, except for the following bound which can be 
shown using simple counting and holds for adaptive noiseless schemes as well: 

Lemma 4.10. Let M be an m x n measurement matrix that is (0,0,e' ,e'i)- 
resilient for d-sparse vectors. Then 

m > d\og(n/d) — d — e' — 0(e' l log((n — d — e^/e^)), 

where the last term is dehned to be zero for e'i = 0. 

Proof. The proof is a simple counting argument. For integers a > b > 0, 
we use the notation V(a, b) for the volume of a Hamming ball of radius b in 
{0, l} a . It is given by 



V(a,b) = ^( a ) <2 ah{ - b l^ 

i—n \ / 



where h(-) is the binary entropy function defined as 

h(x) := -x log 2 (s) - (1 - x) log 2 (l - x), 

and thus 

logV (a,b) <blog® + (a-b)log^- = 6(61og(o/6)). 
o a — b 

Also, denote by V'(a, b, eo, ei) the number of vectors in {0, l} a that are (eo, e{] 
close to a fixed 6-sparse vector. Obviously, V'(a, b, eo,ei) < V(b, eo)V(a - 
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b, ei). Now consider any (without loss of generality, deterministic) reconstruc- 
tion algorithm D and let X denote the set of all vectors in {0, l} n that it 
returns for some noiseless encoding; that is, 

X := {x G {0,1}" \3y G B,x = D(A[y})}, 

where B is the set of d-sparse vectors in {0, 1}". Notice that all vectors in X 
must be (d+eg)-sparse, as they have to be close to the corresponding "correct" 
decoding. For each vector x G X and y G B, we say that x is matching to 
y if (y,x) are (e' Q , e^-close. A vector x G X can be matching to at most 
v := V'(n, d + e' , e' , e^) vectors in £>, and we upper bound log v as follows: 

log^ < logV(n-d-e' ,e[)+logV(d+e' ,e' ) = 0(e[ log((n-d-e^ ) )/e , 1 ))+d+e , , 

where the term inside O(-) is interpreted as zero when e[ = 0. Moreover, 
every y G B must have at least one matching vector in X, namely, D(M[y\). 
This means that \X\ > |i?|/t>, and that 

log \X\ > log \B\ - logw > dlog(n/d) - d- e' - 0(e[ log((n - d - ed)/^)). 

Finally, we observe that the number of measurements has to be at least \X\ 
to enable D to output all the vectors in X. □ 

According to the lemma, even in the noiseless scenario, any reconstruction 
method that returns an approximation of the sparse vector up to e' = 0(d) 
false positives and without false negatives will require Q(dlog(n/d)) measure- 
ments. As we will show in the next section, an upper bound of O(dlogn) 
is in fact attainable even in a highly noisy setting using only non-adaptive 
measurements. This in particular implies an asymptotically optimal trivial 
two-stage group testing scheme. 

4.2.2 A Noise-Resilient Construction 

In this section we introduce our general construction and design measurement 
matrices for testing d-sparse vectors in {0, l} n . The matrices can be seen as 
adjacency matrices of certain unbalanced bipartite graphs constructed from 
good randomness condensers or extractors. The main technique that we use to 
show the desired properties is the list-decoding view of randomness condensers, 
extractors, and expanders, developed over the recent years starting from the 



work of Ta-Shma and Zuckerman on extractor codes 1491 and followed by 



Guruswami, Umans, Vadhan [78] and Vadhan 1551. 



4.2.2.1 Construction from Condensers 

We start by introducing the terms and tools that we will use in our construc- 
tion and its analysis. 
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Definition 4.11. (mixtures, agreement, and agreement list) Let E be a finite 
set. A mixture over E n is an n-tuple S := (Si, . . . , S n ) such that every Si, 
i G [n], is a nonempty subset of E. 

The agreement of w := (w\, . . . w n ) G E n with S, denoted by Agr(w, S), is 
the quantity 

-\{i€ [n\: Wi £ Si}\. 
n 

Moreover, we define the quantity 



wgt(S) := J2 \ S i\ 

ie[n] 

and 

p(5):=wgt(S)/(n|S|), 

where the latter is the expected agreement of a random vector with S. 

For example, consider a mixture S := (S±, . . . ,Ss) over [4] 8 where Si := 
0, S 2 := {1, 3}, S 3 := {1, 2}, S 4 := {1,4}, S 5 := {1}, S 6 := {3}, S 7 := {4}, S 8 := 
{1, 2, 3, 4}. For this example, we have 

Agr((l,3,2,3,4,3,4,4),5) = 5/8, 

and p(S) = 13/32. 

For a code CCS" and a £ (0, 1], the a-agreement list of C with respect 
to S, denoted by LIST^S", a), is defined as the set 

LIST C (5, a):={ceC: Agr(c, S) > a}. 

Definition 4.12. (induced code) Let /:Txl]->Sbea function mapping 
a finite set T x f2 to a finite set E. For x E T, we use the shorthand f(x) to 
denote the vector y := (yi)i<=n, y% := f(x,i), whose coordinates are indexed by 
the elements of $7 in a fixed order. The code induced by f, denoted by C(f) is 
the set 

{f(x):xer}. 

The induced code has a natural encoding function given by x t— > f(x). 

Definition 4.13. (codeword graph) Let C C E n , |E| = q, be a g-ary code. 
The codeword graph of C is a bipartite graph with left vertex set C and right 
vertex set n x E, such that for every x = (xi, . . . , x n ) G C, there is an edge 
between x on the left and (l,xi), . . . ,(n,x n ) on the right. The adjacency 
matrix of the codeword graph is an n|S| x \C\ binary matrix whose (i,j')th 
entry is 1 if and only if there is an edge between the ith right vertex and the 
j'th left vertex. 
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Figure 4.1: A function /: {0, l} 4 x [3] -)■ {0, 1} with its truth table (top left), 
codeword graph of the induced code (right), and the adjacency matrix of the 
graph (bottom left). Solid, dashed and dotted edges in the graph respectively 
correspond to the choices y = 1, y = 2, and y = 3 of the second argument. 




A simple example of a function with its truth table, codeword graph of 
the induced code along with its adjacency matrix is given in Figure |4~Tj 

The following theorem is a straightforward generalization of the result in 



149 that is also shown in [78] (we have included a proof for completeness): 



Theorem 4.14. Let f : {0, 1}" x {0, 1}* ->• {0, 1}* be a strong k -)- e k' con- 
denser, and CCS 2 be its induced code, where E := {0, 1} . Then for any 
mixture S over X 2 we have 



|LIST c (5,p(5)2 



l-k' 



+ e)|<2 A 



Proof. Index the coordinates of S by the elements of {0, l} 4 and denote the 
ith coordinate by Si. Let Y be any random variable with min-entropy at least 
t + k! distributed on F 2 . Define an information-theoretic test T: {0, 1} x 
{0, 1}* — > {0, 1} as follows: T(x, i) = 1 if and only if x € Si. Observe that 



Pr[T(y) = 1] < wgt(,S)2 



-(t+k') 



p(S)2 



l-k' 



When q = 1, we consider codewords with full agreement with the mixture. 
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and that for every vector w € ({0, 1} 



l\2 l 



Pi{T(w t ,i) = l]=kgr(w,S). 

Now, let the random variable X = (Xi, . . . , X 2 t) be uniformly distributed on 
the codewords in LISTc(5, p(S)2 f ~ k + e) and Z ~ Ut- Thus, from Defini- 
tion 14.111 we know that 

PT[T(X z ,Z) = l]>p(S)2 i - k ' + e. 

As the choice of Y was arbitrary, this implies that T is able to distinguish 
between the distribution of (Z, X) and any distribution on {0, 1}' + with min- 
entropy at least t + k' , with bias greater than e, which by the definition of 
condensers implies that the min-entropy of X must be less than k, or 

|LISTc(5,/>(5)2 Z - fc ' + e)| <2 k . 

□ 

Now using the above tools, we are ready to describe and analyze our 
construction of error-resilient measurement matrices. We first state a general 
result without specifying the parameters of the condenser, and then instantiate 
the construction with various choices of the condenser, resulting in matrices 
with different properties. 

Theorem 4.15. Let f : {0, 1}" x {0, 1}* -> {0, l} e be a strong k ->- £ k' con- 
denser, and C be its induced code. Suppose that the parameters p, v, 7 > 
are chosen so that 

{p + j)2 i - k ' + v/-/ < 1-e, 

and d := 72 . Then the adjacency matrix of the codeword graph ofC (which 
has m := 2 t+ rows and n := 2 n columns) is a (pm, (u/d)m, 2 — d, 0)-resilient 
measurement matrix for d-sparse vectors. Moreover, it allows for a reconstruc- 
tion algorithm with running time 0(mn). 

Proof. Define L := 2 e and T := 2*. Let M be the adjacency matrix of the 
codeword graph of C. It immediately follows from the construction that the 
number of rows of M (denoted by m) is equal to TL. Moreover, notice that 
the Hamming weight of each column of M is exactly T. 

Let x £ {0,1}" and denote by y G {0, l} m its encoding, i.e., y := M[s], 
and by y 6 {0, l} m a received word, or a noisy version of y. 

The encoding of x can be schematically viewed as follows: The coefficients 
of x are assigned to the left vertices of the codeword graph and the encoded 
bit on each right vertex is the bitwise "or" of the values of its neighbors. 

The coordinates of x can be seen in one-to-one correspondence with the 
codewords of C. Let X C C be the set of codewords corresponding to the 
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support of x. The coordinates of the noisy encoding y are indexed by the 
elements of [T] x [L] and thus, y naturally defines a mixture S = (Si, . . . , St) 
over [L] T , where S, contains j iff y at position (i, j) is 1. 

Observe that p(S) is the relative Hamming weight (denoted below by S(-)) 
of y; thus, we have 

p(S) = 5{y) < 5{y) +p < d/L + p = 7 +p, 

where the last inequality comes from the fact that the relative weight of each 
column of M is exactly 1/L and that x is d-sparse. 

Furthermore, from the assumption we know that the number of false nega- 
tives in the measurement is at most vTL/d = uT/j. Therefore, any codeword 
in X must have agreement at least 1 — 1//7 with 5. This is because S is in- 
deed constructed from a mixture of the elements in X, modulo false positives 
(that do not decrease the agreement) and at most vT j^ false negatives each 
of which can reduce the agreement by at most 1/T. 

Accordingly, we consider a decoder which, similar to the distance decoder 
that we have introduced before, simply outputs a binary vector x supported 
on the coordinates corresponding to those codewords of C that have agreement 
larger than 1 — 1//7 with S. Clearly, the running time of the decoder is linear 
in the size of the measurement matrix. 

By the discussion above, x must include the support of x. Moreover, 



Theorem 4.14 applies for our choice of parameters, implying that x must have 
weight less than 2 k . □ 

4.2.2.2 Instantiations 



Now we instantiate the general result given by Theorem 4.15 with various 
choices of the underlying condenser, among the results discussed in Section 2.3 
and compare the obtained parameters. First, we consider two extreme cases, 
namely, a non-explicit optimal condenser with zero overhead (i.e., extractor) 
and then a non-explicit optimal condenser with zero loss (i.e., lossless con- 
denser) and then consider how known explicit constructions can approach the 
obtained bounds. A summary of the obtained results is given in Table |4.1| 



Optimal Extractors 

Recall Radhakrishan and Ta-Shma's non-constructive bound that for every 
choice of the parameters k, h, e, there is a strong (k, e)-extractor with input 
length n, seed length t = log(n — k) + 21og(l/e) + 0(1) and output length 
£ = k — 2 log(l/e) — 0(1), and that the bound is achieved by a random function. 



Plugging this result in Theorem 4.15, we obtain a non-explicit measurement 



matrix from a simple, randomized construction that achieves the desired trade- 
off with high probability: 
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Table 4.1: A summary of constructions in Section 4.2.2 The parameters 



a £ [0, 1) and 5 £ (0, 1] are arbitrary constants, m is the number of measure- 
ments, eo (resp., e\) the number of tolerable false positives (resp., negatives) 
in the measurements, and e is the number of false positives in the recon- 
struction. The fifth column shows whether the construction is explicit (Exp) 
or randomized (Rnd), and the last column shows the running time of the 
reconstruction algorithm. 











Exp/ 


Rec. 


m 


eo 


ei 


e'o 


Rnd 


Time 


O(dlogn) 


am 


n(m/d) 


0(d) 


Rnd 


0(mn) 


O(dlogn) 


O(m) 


tt(m/d) 


5d 


Rnd 


O(mn) 


0(d 1+ °W\ogn) 


am 


n(m/d) 


0(d) 


Exp 


O(mn) 


d ■ quasipoly(logn) 


n(m) 


n(m/d) 


5d 


Exp 


0(mn) 


d ■ quasipoly(logn) 


am 


n(m/d) 


0(d) 


Exp 


poly(m) 


poly(d)poly(logn) 


poly(<i)poly(logn) 


n(e /d) 


Sd 


Exp 


poly(m) 



Corollary 4.16. For every choice of constants p € [0,1) and v G [0,^o)> 
vq := (\/5 — Ap — l) 3 /8, and positive integers d and n > d, there is an m x n 
measurement matrix, where m = 0(dlogn), that is (pm,(i'/d)m,O(d),0)- 
resilient for d-sparse vectors of length n and allows for a reconstruction algo- 
rithm with running time 0(mn). 

Proof. For simplicity we assume that n = 2 n and d = 2 for positive integers 
n and d. However, it should be clear that this restriction will cause no loss of 
generality and can be eliminated with a slight change in the constants behind 
the asymptotic notations. 



We instantiate the parameters of Theorem 4.15 using an optimal strong 
extractor. If v = 0, we choose 7, e small constants such that 7 + e < 1 — p. 



Otherwise, we choose 7 := 



which makes z^/7 



3 / 2 



and e < 1 — p 



y/v— \Jv 2 . (One can easily see that the right hand side of the latter inequality 
is positive for v < vq). Hence, the condition p + v/^ < 1 — e — 7 required by 
Theorem 14. 151 is satisfied. 

Let r = 21og(l/e) + 0(1) = O(l) be the entropy loss of the extractor 
for error e, and set up the extractor for min-entropy k = logd + log(l/7) + 
r, which means that K := 2 k = 0(d) and L := 2 e = d/j = O(d). Now 



we can apply Theorem 4.15 and conclude that the measurement matrix is 
(pm, (v/d)m, 0(d), 0)-resilient. The seed length required by the extractor is 
t < logh + 21og(l/e) + O(l), which gives T := 2* = O(logn). Therefore, the 
number of measurements will be m = TL = 0(d\ogn). □ 
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Optimal Lossless Condensers 



Now we instantiate Theorem 4.15 with an optimal strong lossless condenser 
with input length n, entropy requirement k, seed length t = log n + log(l/e) + 
0(1) and output length I = k + log(l/e) + O(l). Thus we get the following 
corollary. 

Corollary 4.17. For positive integers n > d and every constant 5 > there is 
an mxn measurement matrix, wherem = 0{d\ogn), that is (£l(m),£l(l/d)m, 
Sd, 0)-resilient for d-sparse vectors of length n and allows for a reconstruction 
algorithm with running time 0(mn). 



Proof. We will use the notation of Theorem 4.15 and apply it using an optimal 
strong lossless condenser. This time, we set up the condenser with error 
e := \8/(l + 8) and min-entropy k such that K := 2 k = d/(l — 2e). As the 
error is a constant, the overhead and hence 2^~ k will also be a constant. The 
seed length is t = log(n/e) + 0(1), which makes T := 2* = O(logn). As 
L := 2 = 0(d), the number of measurements becomes m = TL = 0(dlogn), 
as desired. 

Moreover, note that our choice of K implies that K — d = 5d. Thus we 
only need to choose p and v appropriately to satisfy the condition 

(4.4) (p + ~f)L/K + v/~/ < 1-e, 

where 7 = d/L = K/(L(\ + 5)) is a constant, as required by the lemma. 



Substituting for 7 in (4.4) and after simple manipulations, we get the condition 
pL/K + v{L/K){\ + 8) < 



2(1 + 5)' 



which can be satisfied by choosing p and v to be appropriate positive constants. 

□ 

Both results obtained in Corollaries 14.161 and 14.171 almost match the lower 



bound of Lemma 4.10 for the number of measurements. However, we note the 



following distinction between the two results: Instantiating the general con- 



struction of Theorem 4.15 with an extractor gives us a sharp control over the 
fraction of tolerable errors, and in particular, we can obtain a measurement 
matrix that is robust against any constant fraction (bounded from 1) of false 
positives. However, the number of potential false positives in the reconstruc- 
tion will be bounded by some constant fraction of the sparsity of the vector 
that cannot be made arbitrarily close to zero. 

On the other hand, using a lossless condenser enables us to bring down the 
number of false positives in the reconstruction to an arbitrarily small fraction 



of d (which is, in light of Lemma 4.8 , the best we can hope for) , though it does 
not give as good a control on the fraction of tolerable errors as in the extractor 
case, though we still obtain resilience against the same order of errors. 
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Recall that the simple divide-and-conquer adaptive construction given 
in beginning the chapter consists of 0(log(n/d)) non-adaptive stages, where 
within each stage 0(d) non-adaptive measurements are made, but the choice 
of the measurements for each stage fully depends on all the previous out- 
comes. By the lower bounds on the size of disjunct matrices, we know that 
the number of non-adaptive rounds cannot be reduced to 1 without affecting 
the total number of measurements by a multiplicative factor of £l(d). How- 



ever, our non-adaptive upper bounds (Corollaries 4.16 and 4.17) show that 
the number of rounds can be reduced to 2, while preserving the total number 
of measurements at 0(<ilogn). In particular, in a two-stage scheme, the first 
non-adaptive round would output an approximation of the (i-sparse vector up 
to 0(d) false positive (even if the measurements are highly unreliable) and 
the second round simply examines the 0(d) possible positions using trivial 
singleton measurements to pinpoint the exact support of the vector. 

Applying the Guruswarai-Umans-Vadhan's Extractor 



While Corollaries 4.16 and 4.17 give probabilistic constructions of noise-resi- 
lient measurement matrices, certain applications require a fully explicit ma- 
trix that is guaranteed to work. To that end, we need to instantiate Theo- 
rem 4.15 with an explicit condenser. First, we use the nearly-optimal explicit 



extractor of Guruswami, Umans and Vadhan (Theorem 2.24), that currently 
gives the best trade-off for the range of parameters needed for our applica- 
tion. Using this extractor, we obtain a similar trade-off as in Corollary |4.16[ 
except for a higher number of measurements which would be bounded by 
O ( 2 0(iog 2 io ga !) dlogn ) = o(d 1+ °Wlogn). 

Corollary 4.18. For every choice of constants p E [0,1) and v £ [0,fo), 
vq := (i/5 — 4p — l) 3 /8, and positive integers d and n > d, there is a fully 
explicit m x n measurement matrix, where 



m 



O(2 0(iog 2 io g d) dlogn) = ( d i+o(i) logn ) > 



that is (pm, (v/d)m, 0(d), 0)-resilient for d-sparse vectors of length n and al- 
lows for a reconstruction algorithm with running time 0(mn). □ 

Applying "Zig-Zag" Lossless Condenser 

An important explicit construction of lossless condensers that has an almost 
optimal output length is due to Capalbo et al. [23]. This construction borrows 
the notion of "zig-zag products" that is a combinatorial tool for construction 
of expander graphs as a major ingredient of the condenser. The following 
theorem quotes a setting of this construction that is most useful for our ap- 
plication: 
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Theorem 4.19. 23 For every I; < n £ I, e>0 there is an explicit k — ¥ t k 
condenser 5 with seed length d = 0(log (n/e)) and output length m = k + 
log(l/e) + O(l). □ 



Combining Theorem 4.15 with the above condenser, we obtain a similar 
result as in Corollary 4.17[ except that the number of measurements would be 

d2 log 3 (logn) = d . quas jp |y(l ogn ). 

Corollary 4.20. For positive integers n > d and every constant 5 > there 
is a fully explicit m x n measurement matrix, where 



m 



d2 log 3 (logn) = d . quasipo | y ( logn ) 5 



that is (Q(m),Cl(l/d)m,5d,0)-resilient for d-sparse vectors of length n and 
allows for a reconstruction algorithm with running time 0(mn). □ 



4.2.2.3 Measurements Allowing Sublinear Time Reconstruction 



The naive reconstruction algorithm given by Theorem 4.15 works efficiently 
in linear time in the size of the measurement matrix. However, for very sparse 
vectors (i.e., d <C n), it might be of practical importance to have a recon- 
struction algorithm that runs in sublinear time in n, the length of the vec- 
tor, and ideally, polynomial in the number of measurements, which is merely 
poly(logn, d) if the number of measurements is optimal. 

As shown in 149 , if the code C in Theorem 4.14 is obtained from a strong 



extractor constructed from a black-box pseudorandom generator (PRG), it is 
possible to compute the agreement list (which is guaranteed by the theorem 
to be small) more efficiently than a simple exhaustive search over all possible 
codewords. In particular, in this case they show that LISTc(5, p(S) + e) can be 
computed in time poly(2 4 , 2 e , 2 k , 1/e) (where t,l, k, e are respectively the seed 
length, output length, entropy requirement, and error of the extractor), which 
can be much smaller than 2 n (n being the input length of the extractor). 

Currently two constructions of extractors from black-box PRGs are known: 
Trevisan's extractor 152| (as well as its improvement in 123] ) and Shaltiel- 
Umans' extractor [134 . However, the latter can only extract a sub-constant 
fraction of the min-entropy and is not suitable for our needs, albeit it requires 
a considerably shorter seed than Trevisan's extractor. Thus, here we only 



consider Raz's improvement of Trevisan's extractor given in Theorem 2.20 



Using this extractor in Theorem 4.15, we obtain a measurement matrix for 
which the reconstruction is possible in polynomial time in the number of 
measurements; however, as the seed length required by this extractor is larger 



than Theorem 2.24, we will now require a higher number of measurements 



than before. Specifically, using Trevisan's extractor, we get the following. 



5 Though not explicitly mentioned in 
strong. 



2:5 



these condensers can be considered to be 
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Corollary 4.21. For every choice of constants p £ [0,1) and v £ [0, i^o), 
vq := (-y/5 — 4p — l) 3 /8, and positive integers d and n > d, there is a fully 
explicit m x n measurement matrix M that is {pm, {v/d)m, 0(d), 0)-resilient 
for d-sparse vectors of length n, where 

m = 0(ci2 log3logn ) = d ■ quasipoly(logn). 

Furthermore, M allows for a reconstruction algorithm with running time 
poly(m), which would he suhlinear in n for d = 0{n c ) and a suitably small 
constant c > 0. □ 

On the condenser side, we observe that the strong lossless (and lossy) 



condensers due to Guruswami et al. (given in Theorem 2.22) also allow ef- 



ficient list-recovery. The code induced by this condenser is precisely a list- 



decodable code due to Parvaresh and Vardy 118 . Thus, the efficient list 



recovery algorithm of the condenser is merely the list-decoding algorithm for 



this code 6 . Combined with Theorem 4.15, we can show that codeword graphs 



of Parvaresh- Vardy codes correspond to good measurement matrices that al- 
low sublinear time recovery, but with incomparable parameters to what we 



obtained from Trevisan's extractor (the proof is similar to Corollary 4.17): 



Corollary 4.22. For positive integers n> d and any constants 5, a > there 
is an m x n measurement matrix, where 



m 



0(d 3+a+2 / a (logn) 2+2/a ), 
that is (f2(e), £l(e/d), 5d, 0)-resilient for d-sparse vectors of length n, where 

:={\ogn) 1+1 ' a d 2+1 / a . 



e : = 



Moreover, the matrix allows for a reconstruction algorithm with running time 
poly(m). □ 

We remark that we could also use a lossless condenser due to Ta-Shma et 
al. [148 which is based on Trevisan's extractor and also allows efficient list 



recovery, but it achieves inferior parameters compared to Corollary 4.22 
4.2.2.4 Connection with List-Recoverability 



Extractor codes that we used in Theorem 4.15 are instances of soft-decision 
decodable codes 7 that provide high list-decodability in "extremely noisy" sce- 
narios. In fact it is not hard to see that good extractors or condensers are 

6 For similar reasons, any construction of measurement matrices based on codeword 
graphs of algebraic codes that are equipped efficient soft-decision decoding (including the 
original Reed-Solomon based construction of Kautz and Singleton [89]) allow sublinear time 
reconstruction. 

7 To be precise, here we are dealing with a special case of soft-decision decoding with 
binary weights. 
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required for our construction to carry through, as Theorem 4.14 can be shown 
to hold, up to some loss in parameters, in the reverse direction as well (as 
already shown by Ta-Shma and Zuckerman 1 149, Theorem 1] for the case of 
extractors). 

However, for designing measurement matrices for the noiseless (or low- 
noise) case, it is possible to resort to the slightly weaker notion of list recover- 
able codes. Formally, a code C of block length h over an alphabet £ is called 
(a,d,£)-list recoverable if for every mixture S over S n consisting of sets of 
size at most d each, we have |LISTc(S, a)| < £. A simple argument similar 



to Theorem 4.15 shows that the adjacency matrix of the codeword graph of 
such a code with rate R gives a (logn)|£|/i? x n measurement matrix 8 for 
d-sparse vectors in the noiseless case with at most £ — d false positives in the 
reconstruction. 

Ideally, a list-recoverable code with a = 1, alphabet size 0(d), positive 
constant rate, and list size £ = 0(d) would give an 0(d log n) x n matrix for 
d-sparse vectors, which is almost optimal (furthermore, the recovery would 
be possible in sublinear time if C is equipped with efficient list recovery). 
However, no explicit construction of such a code is so far known. 

Two natural choices of codes with good list-recoverability properties are 
Reed-Solomon and Algebraic-Geometric codes, which in fact provide soft- 
decision decoding with short list size (cf. [74] ). However, while the list size is 
polynomially bounded by n and d, it can be much larger than 0(d) that we 
need for our application even if the rate is polynomially small in d. 



On the other hand, it is shown in 77 that folded Reed-Solomon Codes are 



list-recoverable with constant rate, but again they suffer from large alphabet 
and list size 9 . 

We also point out a construction of (a, d, d) list-recoverable codes (allowing 
list recovery in time 0(nd)) in 177) with rate polynomially small but alphabet 
size exponentially large in d, from which they obtain superimposed codes. 

4.2.2.5 Connection with the Bit-Probe Model and Designs 

An important problem in data structures is the static set membership problem 
in bit-probe model, which is the following: Given a set S of at most d elements 
from a universe of size n, store the set as a string of length m such that any 
query of the type "is x in S7" can be reliably answered by reading few bits of 
the encoding. The query algorithm might be probabilistic, and be allowed to 
err with a small one or two-sided error. Information theoretically, it is easy to 



8 For codes over large alphabets, the factor |E| in the number of rows can be improved 
using concatenation with a suitable inner measurement matrix. 



As shown in 



7* 



folded Reed-Solomon codes can be used to construct lossless con- 



densers, which eliminates the list size problem. However, they give inferior parameters 



compared to Parvaresh-Vardy codes used in Corollary 4.22 
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see that m = Cl(d\og(n/d)) regardless of the bit-probe complexity and even if 
a small constant error is allowed. 

Remarkably, it was shown in 119] that the lower bound on m can be (non- 
explicitly) achieved using only one bit-probe. Moreover, a part of their work 
shows that any one-probe scheme with negative one-sided error e (where the 
scheme only errs in case x ^ S) gives a [d/e\ -superimposed code (and hence, 



requires m = Q(d log n) by 54 ). It follows that from any such scheme one can 



obtain a measurement matrix for exact reconstruction of sparse vectors, which, 



by Lemma 4.8 cannot provide high resiliency against noise. The converse 
direction, i.e., using superimposed codes to design bit-probe schemes does not 
necessarily hold unless the error is allowed to lie very close to 1. However, 
in 19 combinatorial designs 10 based on low-degree polynomials are used to 
construct one bit-probe schemes with m = 0(d 2 log n) and small one-sided 
error. 

On the other hand, Kautz and Singleton |89j observed that the encoding of 
a combinatorial design as a binary matrix corresponds to a superimposed code 
(which is in fact slightly error-resilient). Moreover, they used Reed-Solomon 
codes to construct a design, which in particular gives a d-superimposed code. 
This is in fact the same design that is used in |19] , and in our terminology, can 
be regarded as the adjacency matrix of the codeword graph of a Reed-Solomon 
code. 

It is interesting to observe the intimate similarity between our framework 



given by Theorem 4.15 and classical constructions of superimposed codes. 
However, some key differences are worth mentioning. Indeed, both construc- 
tions are based on codeword graphs of error-correcting codes. However, classi- 
cal superimposed codes owe their properties to the large distance of the under- 
lying code. On the other hand, our construction uses extractor and condenser 
codes and does not give a superimposed code simply because of the substan- 



tially low number of measurements. However, as shown in Theorem 4.15 , they 
are good enough for a slight relaxation of the notion of superimposed codes 
because of their soft-decision list decodability properties, which additionally 
enables us to attain high noise resilience and a considerably smaller number 
of measurements. 

Interestingly, Buhrman et al. |19] use randomly chosen bipartite graphs to 
construct storage schemes with two-sided error requiring nearly optimal space 



0(d log n), and Ta-Shma 147 later shows that expander graphs from lossless 
condensers would be sufficient for this purpose. However, unlike schemes 
with negative one-sided error, these schemes use encoders that cannot be 
implemented by the "or" function and thus do not translate to group testing 
schemes. 



10 A design is a collection of subsets of a universe, each of the same size, such that the 
pairwise intersection of any two subset is upper bounded by a prespecified parameter. 
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4.3 The Threshold Model 

A natural generalization of classical group testing, introduced by Damaschke 



42 , considers the case where the measurement outcomes are determined by 
a threshold predicate instead of logical "or" . 

In particular, the threshold model is characterized by two integer param- 
eters £, u such that < £ < u (that are considered to be fixed constants), and 
each measurement outputs positive if the number of positives within the cor- 
responding pool is at least u. On the other hand, if the number of positives 
is less than £, the test returns negative, and otherwise the outcome can be 
arbitrary. In this view, classical group testing corresponds to the special case 
where £ = u = 1. In addition to being of theoretical interest, the threshold 
model is interesting for applications, in particular in biology, where the mea- 
surements have reduced or unpredictable sensitivity or may depend on various 
factors that must be simultaneously present in the sample. 

The difference g := u — £ between the thresholds is known as the gap 



parameter. As shown by Damaschke 42 , in threshold group testing identifi 



cation of the set of positives is only possible when the number of positives is at 
least u. Moreover, regardless of the number of measurements, in general the 
set of positives can only be identified within up to g false positives and g false 
negatives (thus, unique identification can be guaranteed only when £ = u). 

Additionally, Damaschke constructed a scheme for identification of the 
positives in the threshold model. For the gap-free case where g = 0, the 
number of measurements in this scheme is 0((d + v?) logn), which is nearly 
optimal (within constant factors). However, when g > 0, the number of 
measurements becomes 0(dn b + d u ), for an arbitrary constant b > 0, if up 
to g + (u — l)/b misclassifications are allowed. Moreover, Chang et al. [24] 
have proposed a different scheme for the gap- free case that achieves O(dlogn) 
measurements. 

A drawback of the scheme presented by Damaschke (as well as the one 
by Chang et al.) is that the measurements are adaptive. As mentioned be- 
fore, for numerous applications (in particular, molecular biology), adaptive 
measurements are infeasible and must be avoided. 

In this section, we consider the non-adaptive threshold testing problem 
in a possibly noisy setting, and develop measurement matrices that can be 
used in the threshold model. Similar to the classical model of group testing, 
non-adaptive measurements in the threshold model can be represented as a 
Boolean matrix, where the ith row is the characteristic vector of the set of 
items that participate in the ith. measurement. 

4.3.1 Strongly Disjunct Matrices 

Non-adaptive threshold testing has been considered by Chen and Fu J27J. 
They observe that, a generalization of the standard notion of disjunct matrices 
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(the latter being extensively used in the literature of classical group testing) 
is suitable for the threshold model. In this section, we refer to this generalized 
notion as strongly disjunct matrices and to the standard notion as classical 
disjunct matrices. Strongly disjunct matrices can be defined as follows. 

Definition 4.23. A Boolean matrix (with at least d + u columns) is said to 
be strongly (d, e; u)-disjunct if for every choice oi d + u distinct columns 

1, . . . , U u , L/ 1 , . . . , Lyj, 

all distinct, we have 

| n? =1 supp(a) \ Uf =1 supp(^)| > e. 

Observe that, (d, e; n)-disjunct matrices are, in particular, (d! , e'; ^-dis- 
junct for any d! < d, e' < e, and u' < u. Moreover, classical (d, e)-disjunct 
matrices correspond to the special case u = 1. 

An important motivation for the study of this notion is the following hidden 
hypergraph learning problem (cf. |51[ Chapter 6] and |50[ Chapter 12]), itself 
being motivated by the so-called complex model in computational biology [26] . 
A (< u)-hypergraph is a tuple (V, E) where V and E are known as the set of 
vertices and hyper-edges, respectively. Each hyperedge e G E is a non-empty 
subset of V of size at most u. The classical notion of undirected graphs (with 
self-loops) corresponds to (< 2)-hypergraphs. 

Now, suppose that G is a (< -u)-hypergraph on a vertex set V of size n, 
and denote by V(G) the set of vertices induced by the hyper-edge set of G; 
i.e., v E V(G) if and only if G has a hyper-edge incident to v. Then assuming 
that |V(6?)| < d for a sparsity parameter d, the aim in the hypergraph-learning 
problem is to identify G using as few (non-adaptive) queries of the following 
type as possible: Each query specifies a set Q C V, and its corresponding 
answer is a Boolean value which is 1 if and only if G has a hyperedge contained 
in Q. 



It is known that [26,66 , in the hypergraph learning problem, any suitable 
grouping strategy defines a strongly disjunct matrix (whose rows are charac- 
teristic vectors of individual queries Q), and conversely, any strongly disjunct 
matrix can be used as the incidence matrix of the set of queries. Below we 
recollect a simple proof of this fact. 

Lemma 4.24. Let M be a strongly (d,e;u) -disjunct matrix with columns 
indexed by the elements of a vertex set V, and G and G' be any two distinct 
(< u) -hyper graphs on V such that V(G) < d and V(G') < d. Then the vector 
of the outcomes corresponding to the queries defined by M on G and G' differ 
in more than e positions. Conversely, if M is such that the query outcomes 
differ in more than e positions for every choice of the hypergraphs G and G' 
as above, then it must be strongly (d — u, e; u)-disjunct. 
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Proof. Suppose that M is an m x \V\ strongly (d, e; it)-disjunct matrix, and 
consider distinct (< u)-hypergraphs G = (V, E) and G' = (V, E') with V(G) < 
d and V(G') < d. Denote by y,y' 6 {0, l} m the vector of query outcomes 
for the two graphs G and G', respectively. Without loss of generality, let 
S £ E be chosen such that no hyper-edge of G' is contained in it. Let V := 
V(G') \ 5, and denote by Gi, . . . , Cigi (resp., C[, . . . , Ciy'i) the columns of M 



corresponding to the vertices in S (resp., V). By Definition 4.23 there is a set 
T C [m] of more than e indices such that for every i £ [\S\] (resp., i £ [|V|]) 
and every t £ T, Ci(t) = 1 (resp., C'At) = 0). This means that, for each such 
t, the answer to the tth query must be 1 for G (as the query includes the 
vertex set of S) but for G' (considering the assumption that no edge of G' 
is contained in S). 

For the converse, let S, Z C [V] be disjoint sets of vertices such that \S\ = u 
and \Z\ = d — u, and denote by {C\, . . . , C u } and {G{, . . . , C' d _ u } the set of 
columns of M picked by S and T, respectively. Take any v £ S, let the u- 
hypergraph G = (V, E) be a -u-clique on Z U S \ {v}, and G' = (V, E') be such 
that E' := E U {S}. Denote by y, y' £ {0, l} m the vector of query outcomes 
for the two graphs G and G', respectively. Since G' is a subgraph of G, it 
must be that supp(y') C supp(y). 

Let T := supp(y) \ supp(y). By the distinguishing property of M, the set 
T must have more than e elements. Take any t £ T. We know that the tth 
query defined by M returns positive for G but negative for G'. Thus this 
query must contain the vertex set of S, but not any of the elements in Z 
(since otherwise, it would include some z £ Z and subsequently, {z} U S\{v}, 
which is a hyperedge of G'). It follows that for each i £ [u] (resp., i £ [d — u]), 
we must have Ci(t) = 1 (resp., G 2 '(t) = 0) and the disjunctness property as 
required by Definition 4.23| holds. □ 



The parameter e determines "noise tolerance" of the measurement scheme. 
Namely, a strongly (d, e; n)-disjunct matrix can uniquely distinguish between 
d-sparse hypergraphs even in presence of up to [e/2\ erroneous query out- 
comes. 



The key observation made by Chen and Fu 27 is that threshold group 
testing corresponds to the special case of the hypergraph learning problem 
where the hidden graph G is known to be a u-clique 11 . In this case, the 
unknown Boolean vector in the corresponding threshold testing problem would 
be the characteristic vector of V(G). It follows that strongly disjunct matrices 
are suitable choices for the measurement matrices in threshold group testing. 

More precisely, the result by Chen and Fu states that, for threshold param- 
eters £ and u, a strongly {d—l — 1, 2e; u)-disjunct matrix suffices to distinguish 

11 A u-clique on the vertex set V is a (< it)-hypergraph (V, E) such that, for some V' C V, 
E is the set of all subsets of V' of size u. 
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between ci-sparse vectors in the threshold model 12 , even if up to e erroneous 
measurements are allowed. 

Much of the known results for classical disjunct matrices can be extended 
to strongly disjunct matrices by following similar ideas. In particular, the 



probabilistic result of Theorem 4.4 can be generalized to show that strongly 



(d, e; u)-disjunct matrices exist with 

m = 0(d u+1 (log(n/d))/(l-p) 2 ) 

rows and error tolerance 

e = n{pdlog(n/d)/(l-p) 2 ), 

for any noise parameter p £ [0, 1). On the negative side, however, several 
concrete lower bounds are known for the number of rows of such matrices 



53, 144 145 . In asymptotic terms, these results show that one must have 

m = Q(d u+l log d n + ed u ), 
and thus, the probabilistic upper bound is essentially optimal. 

4.3.2 Strongly Disjunct Matrices from Codes 

For the underlying strongly disjunct matrix, Chen and Fu |27] use a greedy 
construction 



28 



that achieves, for any e > 0, 0((e + l)d u+ \og{n/d)) rows 



but may take exponential time in the size of the resulting matrix. 



Nevertheless, as observed by several researchers (26,53,66,91 , a classical 
explicit construction of combinatorial designs due to Kautz and Singleton |89| 
can be extended to construct strongly disjunct matrices. This concatenation- 
based construction transforms any error-correcting code having large distance 
into a disjunct matrix. 

While the original construction of Kautz and Singleton uses Reed-Solomon 
codes and achieves nice bounds, it is possible to use other families of codes. In 



particular, as was shown by Porat and Rothschild 1 120 , codes on the Gilbert 



Varshamov bound (see Appendix lAJ) would result in nearly optimal disjunct 
matrices. Moreover, for a suitable range of parameters, they give a determin- 
istic construction of such codes that runs in polynomial time in the size of the 
resulting disjunct matrix (albeit exponential in code's dimension 13 ). 

In this section, we will elaborate on details of this (known) class of con- 
structions, and in addition to Reed-Solomon codes and codes on the Gilbert- 
Varshamov bound (that, as mentioned above, were used by Kautz, Singleton, 

1 Considering unavoidable assumptions that up to g := u — I false positives and g false 
negatives are allowed in the reconstruction, and that the vector being measured has weight 
at least u. 

13 In this regard, this construction of disjunct matrices can be considered weakly explicit 
in that, contrary to fully explicit constructions, it is not clear if each individual entry of the 
matrix can be computed in time poly(d, logn). 
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Given: An (h,k,d) q error-correcting code C C [q] n , and integer pa- 
rameter u > 0. 

Output: An m x n Boolean matrix M, where n = q , and m = ng u . 

Construction: First, consider the mapping (p: [q] — >■ {0, l} 9 from g- 
ary symbols to column vectors of length q u defined as follows. Index 
the coordinates of the output vector by the it-tuples from the set 
[q] u . Then <p(x) has a 1 at position (oi, . . . , a u ) if and only if there is 
an i G [u] such that a% = x. Arrange all codewords of C as columns 
of an h x q matrix M' with entries from [q]. Then replace each 
entry x of M' with (p(x) to obtain the output m x n matrix M. 



Construction 4.1: Extension of Kautz-Singleton's method 89 



Porat and Rothschild) , will consider a family of algebraic-geometric codes and 



Hermitian codes which give nice bounds as well. Construction 4.1 describes 
the general idea, which in analyzed in the following lemma. 

Lemma 4.25. Construction \4A\ outputs a strongly (d,e;u) -disjunct matrix 
for every d < (h — e)/((n — d)u). 

Proof. Let C := {c±, . . . ,c u } C [n] and C := {c / 1 , . . . ,cf d } C [n] be disjoint 
subsets of column indices. We wish to show that, for more than e rows of M , 
the entries at positions picked by C are all-ones while those picked by C are 
all-zeros. For each j £ [n], denote the jth column of M' by M'(j), and let 
M'(C) := {M'( Cj ) : j G [«]}, and M'(C) := {M'ty : j G [d]}. 

From the minimum distance of C, we know that every two distinct columns 
of M' agree in at most n — d positions. By a union bound, for each i G [d], the 
number of positions where M'^) agrees with one or more of the codewords 
in M'(C) is at most u{h — d), and the number of positions where some vector 
in M'{C) agrees with one or more of those in M'(C) is at most du(h — d). 
By assumption, we have h — du{h — d) > e, and thus, for a set E C [n] of 
size greater than e, at positions picked by E none of the codewords in M'(C) 
agree with any of the codewords in M'(C). 

Now let w G [q] n be any of the rows of M' picked by E, and consider the 
q u xn Boolean matrix W formed by applying the mapping ip(-) on each entry 
of w. We know that {w(cj) : j G [u]} n {w(c'A : j G [d]} = 0. Thus we observe 
that the particular row of W indexed by (w(ci), . . . ,w(c u )) (and in fact, any 
of its permutations) must have all-ones at positions picked by C and all-zeros 
at those picked by C . As any such row is a distinct row of M, it follows that 
M is strongly (d, e; u)-disjunct. □ 
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Now we mention a few specific instantiations of the above construction. 
We will first consider the family of Reed-Solomon codes, that are also used 
in the original work of Kautz and Singleton [89], and then move on to the 
family of algebraic geometric (AG) codes on the Tsfasman-Vladu^-Zink (TVZ) 
bound, and Hermitian codes, and finally, codes on the Gilbert- Varshamov 
(GV) bound. A quick review of the necessary background on coding-theoretic 
terms is given in Appendix [A] 

Reed-Solomon Codes 

Let p G [0, 1) be an arbitrary "noise" parameter. If we take C to be an [n, k, d]n 
Reed-Solomon code over an alphabet of size h (more precisely, the smallest 
prime power that is no less than n) , where d = h — k + 1, we get a strongly 
disjunct (d, e; «)-matrix with 



u+l 



rows and 



m = O (du log n/(l — p)) 



pn = Q(pdu(logn)/(l —p)). 



AG Codes on the TVZ Bound 

Another interesting family for the code C is the family of algebraic geometric 
codes that attain the Tsfasman-Vladm>Zink bound (cf. |6T, 154]). This family 



is defined over any alphabet size q > 49 that is a square prime power, and 
achieves a minimum distance d > h — k — h/(yjq — 1). Let e := pn, for a 



noise parameter p G [0,1). By Lemma 4.25, the underlying code C needs 
to have minimum distance at least n(l — (1 — p)/{du)). Thus in order to 
be able to use the above-mentioned family of AG codes, we need to have 
q 3> (du/(l — p) ) 2 =: qo. Let us take an appropriate q G [2qo,8qo], and 



following Lemma 4.25 h — d = \h(l — p)/(du)~\. Thus the dimension of C 
becomes at least 

k>n-d- —^ = n {^^j = Wn/^qo), 

and subsequently 14 we get that logn = klogq > k = Q(n/^/q~o). Now, noting 
that m = q u n, we conclude that 

m = q u h = 0(q^ logn) = O I J logn, 

and e = Q(pdu(logn)/(l —p))- 



14 



Note that, given the parameters p, d, n, the choice of q depends on p, d, as explained 
above, and then one can choose the code length n to be the smallest integer for which we 
have q k > n. But for the sake of clarity we have assumed that q k = n. 
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We see that the dependence of the number of measurements on the sparsity 
parameter d is worse for AG codes than Reed-Solomon codes by a factor d u , 
but the construction from AG codes benefits from a linear dependence on log n, 
compared to log u+ n for Reed-Solomon codes. Thus, AG codes become more 
favorable only when the sparsity is substantially low; namely, when d <C logn. 

Hermitian Codes 

A particularly nice family of AG codes arises from the Hermitian function 
field 15 . Let q' be a prime power and q := q' 2 . Then the Hermitian function 
field over W q is a finite extension of the rational function field ~F g (x), denoted 
by Fg(x, y), where we have y q + y = x q . The structure of this function field 
is relatively well understood and the family of Goppa codes defined over the 
rational points of the Hermitian function field is known as Hermitian codes. 



This family is recently used by Ben-Aroya and Ta-Shma 10 for construction 
of small-bias sets. Below we quote some parameters of Hermitian codes from 
their work. 

The number of rational points of the Hermitian function field is equal to 
q' +1, which includes a common pole Qoo of x and y. The genus of the 
function field is g = q'(q' — l)/2. For some integer parameter r, we take 
G := rQoo as the divisor defining the Riemann-Roch space C(G) of the code 
C, and the set of rational points except Qoo as the evaluation points of the 
code. Thus the length of C becomes h = q' . Moreover, the minimum distance 
of the code is d = n — deg(G) = n — r. When r > 2g — 1, the dimension of 
the code is given by the Riemann-Roch theorem, which is equal to r — g + 1 . 
For the low-degree regime where r < 2g — 1, the dimension k of the code is 
the size of the Wirestrauss semigroup of G, which turns out to be the set 
W = {{i,j) eU 2 :j< q' -1 Mq' +j(q' + 1) < r}. 

Now, given parameters d,p of the disjunct matrix, define p := (1— p)/((d+ 
l)u), take the alphabet size q as a square prime power, and set r := pq 3 ' 2 . 
First we consider the case where r < 2g — 1 = 2q — 2^/q — 1. In this case, 
the dimension of the Hermitian code becomes k = \W\ = Cl(r 2 /q) = Q(p 2 q 2 ). 
The distance d of the code satisfies d = n — r > n(l — p) and thus, for 



e := ph, conditions of Lemma 4.25 are satisfied. The number of the rows of 
the resulting measurement matrix becomes m = g"+ 3 / 2 5 and we have n = q . 
Therefore, 



2„2\ 



log n = k log q > k = Q(p q 



^q = 0(y/logn/p) ^m = I (— — J I, 

and in order to ensure that r < 2g — 1, we need to have du/(l — p) 3> \/logn. 
On the other hand, when du/(l — p) ^ ydogn, we are in the high-degree 



See 142 for an extensive treatment of the notions in algebraic geometry. 



110 CHAPTER 4. GROUP TESTING 



regime, in which case the dimension of the code becomes k = r — g + 1 
Q(r) = Q(pq 3 ' 2 ), and we will thus have 



0((logn/p) 2/3 )^m = o((j 



dlogn,i + 2u/3 



P 



Altogether, we conclude that Construction 4.1 with Hermitian codes results 
in a strongly (d, e; u)-disjunct matrix with 

/ ,dy/logn ,dlogn,2/3^u+3/2 

m = 0[(— + (- ) ) 

\ I — p v 1 — p ' ' 

rows, where e = p ■ Q (d(logn)/(l — p) + (d\f\ogn/(\ — p)) 3 ' 2 )- Compared 
to the Reed-Solomon codes, the number of measurements has a slightly worse 
dependence on d, but a much better dependence on n. Compared to AG codes 
on the TVZ bound, the dependence on d is better while the dependence on n 
is inferior. 

Codes on the GV Bound 

A q-ary (n, k, d)-code (of sufficiently large length) is said to be on the Gilbert- 
Varshamov bound if it satisfies k > n(l — h q (d/h)), where h q {-) is the q-ary 
entropy function defined as 

h q {x) := x \og q {q - 1) - x log g (x) - (1 - x) log g (l - a;). 

It is well known that a random linear code achieves the bound with over- 
whelming probability (cf. [103]). Now we apply Lemma 4.25 on a code on the 



GV bound, and calculate the resulting parameters. Let p := (1 — p)/{Adu) 
choose any alphabet size q G [1/p, 2/p], and let C be any q-ary code of length 
h on the GV bound, with minimum distance d > n(l — 2/q). By the Taylor 
expansion of the function h q (x) around x = 1 — 1/q, we see that the dimen- 
sion of C asymptotically behaves as k = @{h/(qlogq)). Thus the number of 
columns of the resulting measurement matrix becomes n = q k = 2^ n ' q > , and 
therefore, the number m of its rows becomes 

m = q u n = 0{q u+1 logn) = 0((d/(l - p)) u+l logn), 

and the matrix would be strongly (d, e; n)-disjunct for 

e = pn = Q(pd(logn)/(l — p)). 

We remark that for the range of parameters that we are interested in, Porat 
and Rothschild |120| have recently come up with a deterministic construction 
of linear codes on the GV bound that runs in time poly(g ) (and thus, poly- 
nomial in the size of the resulting measurement matrix) . Their construction 
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Table 4.2: Bounds obtained by strongly (d, e; «)-disjunct matrices. The noise 
parameter p £ [0,1) is arbitrary. The first four rows correspond to the explicit 



coding-theoretic construction described in Section 4.3.2[ with the underlying 
code indicated as a remark. 



Number of rows 



Noise tolerance 



Remark 



V(pd^) 

n(pd 1 ^) 
V(pd 1 ^) 



0((^- p ) u+1 logn) 

o((^) u+1 ) 

0(( d )2«+l logn) 



n{d u+1 log d n + ed u ) 



n(p(^S)3/2) 

n(pd l f^) 



Using codes on the GV bound. 

Using Reed-Solomon codes. 

Using Algebraic Geometric 

codes. 

Using Hermitian codes (d S> 

\/logn). 

Probabilistic construction. 



Lower bound (Section 4.3.1) 



is based on a derandomization of the probabilistic argument for random lin- 
ear codes using the method of conditional expectations, and as such, can be 
considered weakly explicit (in the sense that, the entire measurement matrix 
can be computed in polynomial time in its length; but for a fully explicit 
construction one must be ideally able to deterministically compute any single 
entry of the measurement matrix in time poly(d, logn), which is not the case 
for this construction). 



We see that, for a fixed p, Construction 4.1 when using codes on the GV 



bound achieves almost optimal parameters. Moreover, the explicit construc- 
tion based on the Reed-Solomon codes possesses the "right" dependence on 
the sparsity d, AG codes on the TVZ bound have a matching dependence 
on the vector length n with random measurement matrices, and finally, the 
trade-off offered by the construction based on Hermitian codes lies in between 
the one for Reed-Solomon codes and AG codes. These parameters are sum- 
marized in Table 4.2 Note that the special case u = 1 would give classical 



(d, e)-disjunct matrices as in Definition 4.1 



4.3.3 Disjunct Matrices for Threshold Testing 

Even though, as discussed above, the general notion of strongly (d, e; u)- 
disjunct matrices is sufficient for threshold group testing with upper threshold 
u, in this section we show that a weaker notion of disjunct matrices (which 
turns out to be strictly weaker when the lower threshold £ is greater than 1), 
would also suffice. We proceed by showing how such measurement matrices 
can be constructed. 

Before introducing our variation of disjunct matrices, let us fix some nota- 
tion that will be useful for the threshold model. Consider the threshold model 
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with thresholds £ and u, and an to x n measurement matrix M that defines 
the set of measurements. For a vector x G {0, l} 71 , denote by M[x]e tU the set 
of vectors in {0, l} m that correctly encode the measurement outcomes corre- 
sponding to the vector x. In particular, for any y £ M[x]i tU we have y(i) = 1 
if |supp(Mj) n supp(x)| > u, and y{i) = if |supp(Afj) n supp(x)| < £, where 
Mj indicates the jth row of M. In the gap-free case, the set M[x}e, u may only 
have a single element that we denote by M[x] M . Note that the gap- free case 
with u = 1 reduces to ordinary group testing, and thus we have M[x]i = M[a?]. 



To make the main ideas more transparent, until Section 4.3.3.3 we will 



focus on the gap-free case where £ = u. The extension to nonzero gaps is 



straightforward and will be discussed in Section 4.3.3.3 Moreover, often we 
will implicitly assume that the Hamming weight of the Boolean vector that is 
to be identified is at least u (since otherwise, any (u — l)-sparse vector would 
be confused with the all-zeros vector). Moreover, we will take the thresholds 
£, u as fixed constants while the parameters d and n are allowed to grow. 

4.3.3.1 The Definition and Properties 

Our variation of disjunct matrices along with an "auxiliary" notion of regular 
matrices is defined in the following. 

Definition 4.26. A Boolean matrix M with n columns is called (d, e; ir- 
regular if for every subset of columns S C [n] (called the critical set) and 
every Z C [n] (called the zero set) such that u < \S\ < d, \Z\ < \S\, SC\Z = 0, 
there are more than e rows of M at which M\s has weight exactly u and (at 
the same rows) M\z has weight zero. Any such row is said to u-satisfy S and 
Z. 

If, in addition, for every distinguished column i £ S, more than e rows of 
M both u-satisfy S and Z and have a 1 at the ith column, the matrix is called 
(d, e; n)-disjunct (and the corresponding "good" rows are said to u-satisfy i, 
S, and Z). 

It is easy to verify that (assuming 2d < n) the classical notion of (2d— 1, e)- 
disjunct matrices is equivalent to strongly (2d — 1, e; l)-disjunct and (d, e; 1)- 
disjunct. Moreover, any (d, e; u)-disjunct matrix is (d, e; u)-regular, (d—1, e; u— 
l)-regular, and (d, e)-disjunct (but the reverse implications do not in general 
hold). Therefore, the lower bound 

to = i7(d log d n + ed) 

that applies for (d, e)-disjunct matrices holds for (d, e; u)-disjunct matrices as 
well. 

Below we show that our notion of disjunct matrices is necessary and suf- 
ficient for the purpose of threshold group testing: 
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Lemma 4.27. Let M be an m x n Boolean matrix that is (d,e;u) -disjunct. 
Then for every distinct d-sparse vectors x,x' G {0, 1}™ such that 16 supp(x) ^ 
supp(a/), wgt(x) > |supp(x') \ supp(x)| and wgt(x) > u, we have 

(4.5) |supp(M[x] u ) \ supp(M[x'] u )\ > e. 



Conversely, assuming d > 2u, if M satisfies (4.5) for every choice of x and x' 
as above, it must be ([d/2\ , e; u)-disjunct. 

Proof. First, suppose that M is (d, e; u)-disjunct, and let y := M[x] u and 
y' := M[x'] u . Take any i G supp(x) \ supp(x'), and let S := supp(x) and 
Z := supp(x') \ supp(x). Note that | *S' | < d and by assumption, we have 



\Z\ < \S\. Now, Definition 4.26 implies that there is a set E of more than e 
rows of M that u-satisfy i as the distinguished column, S as the critical set 
and Z as the zero set. Thus for every j G E, the jth row of M restricted to 
the columns chosen by supp(x) must have weight exactly n, while its weight 
on supp(V) is less than u. Therefore, y(j) = 1 and y'(j) = for more than e 
choices of j. 

For the converse, consider any choice of a distinguished column i g [n], a 
critical set S C [n] containing i (such that |5| > n), and a zero set Z C [n] 
where \Z\ < \S\. Define (i-sparse Boolean vectors x,x' G {0, l} n so that 
supp(x) := S and supp(x') := S U Z \ {i}. Let y := M[x] u and y' := M[x'] u 
and E := supp(y) \ supp(y'). By assumption we know that \E\ > e. Take any 
j G E. Since y(j) = 1 and y'{j) = 0, we get that the jth row of M restricted 
to the columns picked by S U Z \ {i} must have weight at most u — 1, whereas 
it must have weight at least u when restricted to S. As the sets {i}, S \ {i}, 
and Z are disjoint, this can hold only if M[j,i] = 1, and moreover, the jth 
row of M restricted to the columns picked by S (resp., Z) has weight exactly 
u (resp., zero). Hence, this row (as well as all the rows of M picked by E) 
must u-satisfy i, S, and Z, confirming that M is ([d/2j, e; -u)-disjunct. □ 

We will use regular matrices as intermediate building blocks in our con- 
structions of disjunct matrices to follow. The connection with disjunct ma- 
trices is made apparent through a direct product of matrices defined in Con- 



struction |4.2| Intuitively, using this product, regular matrices can be used 
to transform any measurement matrix suitable for the standard group test- 
ing model to one with comparable properties in the threshold model. The 
following lemma formalizes this idea. 

Lemma 4.28. Let Mi and M2 be Boolean matrices with n columns, such 
that Mi is (d—l,ei;u — l)-regular. Let M := Mi Mi, and suppose that for 
d-sparse Boolean vectors x, x' G {0, l} n such that wgt(x) > wgt(x'), we have 

|supp(M 2 [x]i) \ supp(M2[x']i)| > e 2 . 



16 Note that at least one of the two possible orderings of any two distinct d-sparse vectors, 
at least one having weight u or more, satisfies this condition. 
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Then, |supp(M[x] M ) \ supp(M[x'} u )\ > (ei + l)e 2 . 

Proof. First we consider the case where u > 1. Let y := M2[x]i £ {0, l}™ -2 , 
y' := M2[x']i € {0, l}™ 2 , where mi is the number of rows of M2, and let 
I? := supp(y) \ supp(y'). By assumption, \E\ > e<i. Fix any i £ £7 so that 
2/(i) = 1 and ?/(i) = 0. Therefore, the ith row of M<i must have all zeros 
at positions corresponding to supp(x') and there is a j € supp(x) \ supp(x') 
such that M2[i,j] = 1. Define S := supp(x) \ {j}, Z := supp(x') \ supp(x), 
z := M[x] u and z' := M[x%. 

As wgt(x) > wgt(V), we know that \Z\ < \S\ + 1. The extreme case 
\Z\ = \S\ + 1 only happens when x and x' have disjoint supports, in which 
case one can remove an arbitrary element of Z to ensure that \Z\ < \S\ and 
the following argument (considering the assumption u > 1) still goes through. 
By the definition of regularity, there is a set E\ consisting of at least e\ + 1 
rows of M\ that (u — l)-satisfy the critical set S and the zero set Z. Pick 
any k & E±, and observe that z must have a 1 at position (k,i). This is 
because the row of M indexed by (k, i) has a 1 at the jth position (since the 
ith row of M2 does), and at least u — 1 more l's at positions corresponding 
to supp(x) \ {j} (due to regularity of Mi). On the other hand, note that the 
fcth row of Mi has at most u — 1 ones at positions corresponding to supp(x') 
(because supp(x / ) C S U Z), and the ith row of M2 has all zeros at those 
positions (because y'(i) = 0). This means that the row of M indexed by (k, i) 
(which is the bit-wise or of the kth row of Mi and the ith row of M2) must 
have less than u ones at positions corresponding to supp(x'), and thus, z' must 
be at position (k,i). Therefore, z and z' differ at position (k,i). 

Since there are at least ei choices for i, and for each choice of i, at least 
ei + 1 choices for k, we conclude that in at least (e\ + l)e2 positions, z has a 
one while z' has a zero. 

The argument for u = 1 is similar, in which case it suffices to take S := 
supp(x) and Z := supp(x') \ supp(x). □ 



Given: Boolean matrices Mi and M2 that are mi x n and 777,2 x n, 
respectively. 

Output: An m x n Boolean matrix Mi M2, where m := m\rri2. 

Construction: Let the rows of M := Mi M2 be indexed by the set 
[mi] x [777,2]. Then the row corresponding to (i,j) is defined as the 
bit- wise or of the ith row of Mi and the jth. row of M2 . 



Construction 4.2: Direct product of measurement matrices. 
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Given: Integer parameters n,m',d,u. 

Output: An m x n Boolean matrix M, where m := m'\log(d/u)~\. 

Construction: Let r := [log(d/n)] . Index the rows of M by [r] x \m'\. 
Sample the (i,j)th row of M independently from a (u + l)-wise 
independent distribution on n bit vectors, where each individual bit 
has probability l/(2 t+2 u) of being 1. 



Construction 4.3: Probabilistic construction of regular and disjunct matrices. 



As a corollary it follows that, when M\ is a (d — 1, e\\u — l)-regular and 
M2 is a (d, e2)-disjunct matrix, the product M := M\ M2 will distinguish 
between any two distinct d-sparse vectors (of weight at least u) in at least 
(ei + l)(e2 + 1) positions of the measurement outcomes. This combined with 



Lemma 4.27 would imply that M is, in particular, (|_d/2j, (ei+l)(e2+l) — 1; u)- 



disjunct. However, using a direct argument similar to the above lemma it is 



possible to obtain a slightly better result, given by Lemma 4.29| (the proof 



follows the same line of argument as that of Lemma 4.28 and is thus omitted). 



Lemma 4.29. Suppose that Mi is a (d, e\\ u — 1) -regular and M2 is a (2d, &•})- 
disjunct matrix. Then M\ M2 is a (d, (ei + 1) {e-i + 1) — 1; u)-disjunct matrix. 

D 

As another particular example, we remark that the resilient measurement 



matrices that we constructed in Section 4.2.2 for the ordinary group test- 
ing model can be combined with regular matrices to offer the same qualities 
(i.e., approximation of sparse vectors in highly noisy settings) in the threshold 
model. In the same way, numerous existing results in group testing can be 



ported to the threshold model by using Lemma 4.28 (e.g., constructions of 



measurement matrices suitable for trivial two-stage schemes; cf. |29|). 

4.3.3.2 Constructions 

In this section, we obtain several constructions of regular and disjunct matri- 
ces. Our first construction, described in Construction |4,3[ is a randomness- 
efficient probabilistic construction that can be analyzed using standard tech- 
niques from the probabilistic method. The bounds obtained by this construc- 
tion are given by Lemma |4.30 below. The amount of random bits required by 



this construction is polynomially bounded in d and log n, which is significantly 
smaller than it would be had we picked the entries of M fully independently. 
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Lemma 4.30. For every p £ [0, 1) and integer parameter u > 0, Construc- 
tion\I^with 17 m! = O u (d\og(n/d)/{l-pf) (reap., m! = O u (d 2 \og(n/d)/(l- 
p) 2 )) outputs a (d,Q u (pm');u) -regular (resp., (d,Q u (pm' / d)]u) -disjunct) ma- 
trix with probability 1 — oil). 

Proof. We show the claim for regular matrices, the proof for disjunct matrices 
is similar. Consider any particular choice of a critical set S C [n] and a zero 
set Z C [n] such that u < \S\ < d and \Z\ < \S\. Choose an integer i so that 
2*~ 1 u < \S\ < 2 l u, and take any j £ \m'\. Denote the (i,j)th row of M by 
the random variable w G {0, 1}™, and by q the "success" probability that w\s 
has weight exactly u and w\z is all zeros. For an integer £ > 0, we will use 
the shorthand 1~ (resp., 0^) for the all-ones (resp., all-zeros) vector of length 
I. We have 



q= J2 Pr[( W \ R ) = l u A(w\ Zu(s \ R) ) = 0\ S \ + \ Z \- u ] 



RC[S] 
\R\=u 

= £Pr[(H*) = l u ] -Pr[(z«Uu(S\fl)) = Ol^+I^l-" | (w\ R ) = l u ] 
R 

( = } ^(l/(2 J+2 n))« • (1 - Pr[(w\ Zu(s \ R) ) + Ol^l+I^l- | (w\ R ) = 1"]) 

(b) ^^ 

> ^(l/(2 J+2 n))« • (1 - (|S| + \Z\ - u)/{2^u)) 
R 



t \ (^)(l/(2-«))» > \ (!|!)* • (l/(2-u))» > 55^ 



-: c, 



where (a) and (b) use the fact that the entries of w are (u + l)-wise indepen- 
dent, and (b) uses an additional union bound. Moreover, in (c) the binomial 
term counts the number of possibilities for the set R. Note that the lower 
bound c > obtained at the end is a constant that only depends on u. Now, 
let e := m'pq, and observe that the expected number of "successful" rows is 
m'q. Using Chernoff bounds, and independence of the rows, the probability 
that there are at most e rows (among (i, 1), . . . , (i, m')) whose restriction to S 
and Z has weights u and 0, respectively, becomes upper bounded by 

exp(— (m'q — e) /(2m! q)) = exp( — (1 — p) m'q/2) < exp(— (1 — p) m'c/2). 



17 The subscript in O u (-) and fl u (-) implies that the hidden constant in the asymptotic 
notation is allowed to depend on u. 
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• Given: A strong lossless (k, e)-condenser /: {0, l} n x {0,1}* — > 
{0, l} e , integer parameter u > 1 and real parameter p £ [0, 1) such 
that e < (l-p)/16, 

• Output: An m x n Boolean matrix M, where n := 2 n and m = 

2 t+k O u (2 u ( l ~ k y). 

• Construction: Let G*i = ({0, 1}^, {0, l} k ,Ei) be any bipartite bi- 
regular graph with left vertex set {0, 1}^, right vertex set {0, l} k , 
left degree di := 8-u, and right degree d r := 8u2 £ ~ k . Replace each 
right vertex v of G± with ( M vertices, one for each subset of size 
u of the vertices on the neighborhood of v, and connect them to 
the corresponding subsets. Denote the resulting graph by G2 = 
({0,iy,V 2 ,E 2 ), where \V 2 \ = 2 k ( d ^). Define the bipartite graph 
G 3 = ({0, l} n , V 3 , E 3 ), where V 3 := {0, 1}* x V 2 , as follows: Each left 
vertex x G {0, 1}" is connected to (y, T 2 (f(x, y)), for each y £ {0, 1}*, 
where T 2 (-) denotes the neighborhood function of G 2 (i.e., T 2 (v) 
denotes the set of vertices adjacent to v in G 2 ). The output matrix 
M is the bipartite adjacency matrix of G 3 . 



Construction 4.4: A building block for construction of regular matrices. 

Now take a union bound on all the choices of S and Z to conclude that the 
probability that the resulting matrix is not (d, e; u)-regular is at most 



\S=U V / 2=0 

<d 2 (™) exp(-(l-p)Wc/2), 

which can be made o(l) by choosing m! = O u (dlog(n/d)/(l — p) 2 )- □ 

Now we turn to a construction of regular matrices using strong lossless 
condensers. Details of the construction are described in Construction l4~5l that 
assumes a family of lossless condensers with different entropy requirements 18 , 
and in turn, uses Construction |4.4| as a building block. 

The following theorem analyzes the obtained parameters without specify- 
ing any particular choice for the underlying family of condensers. 



18 We have assumed that all the functions in the family have the same seed length t. If 
this is not the case, one can trivially set t to be the largest seed length in the family. 
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Given: Integer parameters d > u > 1, real parameter p 6 [0,1), 
and a family fo,...,f r of strong lossless condensers, where r := 
\\og(d/u')~\ and v! is the smallest power of two such that u' > u. 
Each /j: {0, l} n x {0,1}* -> {0, 1} £ W is assumed to be a strong 
lossless (/c(i), e)-condenser, where k{i) := \ogu' + i + 1 and e < 
(l-p)/16. 

Output: An m x n Boolean matrix M, where n := 2 n and m = 

2*dELo°«( 2u(lw ~ fe(i)) )- 

Construction: For each ie{0,...,r}, denote by Mj the output ma- 



trix of Construction 4.4 when instantiated with fi as the underlying 
condenser, and by m,j its number of rows. Define ri := 2 r ~* and let 
M[ denote the matrix obtained from Mi by repeating each row ri 
times. Construct the output matrix M by stacking M Q , . . . , M' r on 
top of one another. 



Construction 4.5: Regular matrices from strong lossless condensers. 



Theorem 4.31. Themxn matrixM output by Construction 4.5 is (d,pj2 ; u)- 
regular, where 7 = max{l, Q u (d ■ min{2 fc W~ ^ l > : i = 0, . . . , r})}. 

Proof. As a first step, we verify the upper bound on the number of measure- 
ments m. Each matrix M { has m, = 2 t+k ^O u {2 u ^~ k{ -^) rows, and M[ has 
rrnri rows, where rj = 2 r_ \ Therefore, the number of rows of M is 

i=0 j=0 i=0 

Let 5, Z C {0, l} ra respectively denote any choice of a critical set and 
zero set of size at most d, where \Z\ < \S\, and choose an integer i > so 
that 2 t ~ 1 u' < \S\ < 2 % u' . Arbitrarily grow the two sets S and Z to possibly 
larger, and disjoint, sets S' 5 S and Z' 5 Z such that \S'\ = \Z'\ = 2 % u' (for 
simplicity we have assumed that d < n/2). Our goal is to show that there are 



"many" rows of the matrix Mi (in Construction 4.5) that n-satisfy S and Z . 



Let k := k(i) = logn' + i + 1, £ := £(i), and denote by Gi,G2,G3 the 



bipartite graphs used by the instantiation of Construction 4.4 that outputs 
Mi . Thus we need to show that "many" right vertices of G3 are each connected 
to exactly u of the vertices in S and none of those in Z. 

Consider the uniform distribution X on the set S' U Z\ which has min- 
entropy \ogu' + i + 1. By an averaging argument, since the condenser /j 
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is strong, for more than a p fraction of the choices of the seed y G {0, 1}* 
(call them good seeds), the distribution Z y := fi(X,y) is e/(l — p)-close (in 
particular, 1/16-close) to a distribution with min-entropy logu' + i + 1. 

Fix any good seed y G {0,1}*. Let G = ({0, 1}",{0, l} 1 , E) denote a 
bipartite graph representation of fi, where each left vertex x € {0, l} n is 
connected to f%{x, y) on the right. Denote by F y (S' U Z') the right vertices of 
G corresponding to the neighborhood of the set of left vertices picked by S'UZ' . 



Note that T y (S' U Z') = supp(Z 2/ ). Using Proposition 2.14 in the appendix, 
we see that since Z y is 1/16-close to having min-entropy log(|S" U Z'\), there 
are at least (7/8) | S' U Z'\ vertices in r(S" U Z') that are each connected to 
exactly one left vertex in S' U Z' . Since IS"! > \S' U Z'\/4, this implies that at 
least \S' U Z'\/8 vertices in F(S' U Z') (call them T' y ) are connected to exactly 
one left vertex in S and no other vertex in S' U Z' . In particular we get that 

IT' I "~> nk—3 

\ L y\ d. * 

Now, in Gx, let T y be the set of left vertices corresponding to T' y (regarding 
the left vertices of Gx in one-to-one correspondence with the right vertices of 
G). The number of edges going out of T y in Gx is d(\T y \ > u2 k . Therefore, as 
the number of the right vertices of Gx is 2 , there must be at least one right 
vertex that is connected to at least u vertices in T y . Moreover, a counting 
argument shows that the number of right vertices connected to at least u 
vertices in T y is also at least 2 fc-£ 2 fc /(10u). 

Observe that in construction of G2 from Gx, any right vertex of Gx is 
replicated ( r ) times, one for each -u-subset of its neighbors. Therefore, for a 
right vertex of Gx that is connected to at least u left vertices in T y , one or 
more of its copies in G2 must be connected to exactly u vertex in T y (among 
the left vertices of G2) and no other vertex (since the right degree of G2 is 
equal to u). 

Define 7' := max{l, 2 2 /(10n)}. From the previous argument we know 
that, looking at T y as a set of left vertices of G2, there are at least 7' right 
vertices on the neighborhood of T y in G2 that are connected to exactly u of 
the vertices in T y and none of the left vertices outside T y . Letting v y be any 
such vertex, this implies that the vertex (y,v y ) S V3 on the right part of G3 
is connected to exactly u of the vertices in S, and none of the vertices in Z. 
Since the argument holds for every good seed y, the number of such vertices is 
at least the number of good seeds, which is more than p7'2*. Since the rows of 
the matrix m; are repeated r, = 2 r ~ l times in M, we conclude that M has at 
least p r y'2 t+r ~ l > p^/2 l rows that u-satisfy S and Z, and the claim follows. □ 

Instantiations 



We now instantiate the result obtained in Theorem 4.31 by various choices 
of the family of lossless condensers. The crucial factors that influence the 
number of measurements are the seed length and the output length of the 
condenser. In particular, we will consider optimal lossless condensers (with 
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parameters achieved by random functions), zig-zag based construction of The- 
orem 4.19, and the coding-theoretic construction of Guruswami et al., quoted 



in Theorem 2.22 The results are summarized in the following theorem. 



Theorem 4.32. Let u > be fixed, and p G [0, 1) be a real parameter. Then 
for integer parameters d, n £ IN where u < d < n, 



1. Using an optimal lossless condenser in Construction 4.5 results in an 
mi x n matrix M\ that is (d,ei;u) -regular, where 

mi = 0(d(logn)(logd)/(l-p) u+1 ) 

and ei = £l(pd\ogn), 



2. Using the lossless condenser of Theorem \4.19 in Construction 4.5 results 
in an m 2 x n matrix M 2 that is (d, e 2 ; u)-regular, where 

m 2 = 0(T 2 d{\ogd)/{l-p) u ) 

for some 

T 2 = exp(0(log 3 ((logn)/(l - p)))) = quasipoly(logn), 

and e 2 = Q(pdT 2 (l — p)). 



3. Let /3 > be any fixed constant. Then Construction 4.5 can be instan 



tiated using the lossless condenser of Theorem 2.22 so that we obtain an 
m3 x n matrix M3 that is (d, e^\ u) -regular, where 



m 3 = 0(T^ +u d l+f > (log d)) 



for 



T 3 := ((logn)(logd)/(l-p)) 1+u / /3 = poly(logn,logd), 
and e 3 = n(pmax{T 3 , d 1- ' 3 /"}). 

Proof. First we show the claim for M\. In this case, we take each fa in 



Construction 4.5 to be an optimal lossless condenser satisfying the bounds 
obtained in 19 |[23j . Thus we have that 2* = 0(h/e) = 0(logn/e), and for every 
% = 0, . . . , r, we have 2^ l ' k ^ 1 ' = 0(l/e), where e = 0(1 — p). Now we apply 



Theorem 4.31 to obtain the desired bounds (and in particular, 7 = Q(ed)). 
Similarly, for the construction of M 2 we set up each fa using the explicit 



construction of condensers in Theorem 4.19| for min-entropy k(i). In this case, 
the maximum required seed length is t = 0(log (n/e)), and we let 



'-2 - 



exp(0(log 3 ((logn)/(l-p)))). 



19 This result is similar in spirit to the probabilistic argument used in 
the existence of good extractors. 
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Moreover, for every i = 0, . . . , r, we have 2 i ( l > k ( l > = 0(1/ e). Plugging these 



parameters in Theorem 4.31 gives 7 = Q(ed) and the bounds on m2 and e2 
follow. 



Finally, for M3 we use Theorem 2.22 with a := j3/u. Thus the maximum 
seed length becomes 

t = (1 + u/p) log(n(log d)/(l - p)) + 0(1), 

and for every i = 0, . . . , r, we have £(i) — k(i) = 0(t + f3 (log d)/u). Clearly, 



T3 = 0(2 ), and thus (using Theorem 4.31) the number of measurements 



becomes m.3 = T 1+u d 1+ @ (log d). Moreover, we get 

7 = max{l,fi(d 1 - /3/u /r)}, 

which gives 

e 3 = n(pT~f) = pmax{T, d x ~^ u }, 

as claimed. □ 



By combining this result with Lemma 4.29| using any explicit construction 



of classical disjunct matrices, we will obtain (d, e; u)-disjunct matrices that 
can be used in the threshold model with any fixed threshold, sparsity d, and 
error tolerance [e/2\ . 

In particular, using the coding-theoretic explicit construction of nearly 



optimal classical disjunct matrices (see Table 4.2), we obtain (d, e; u)-disjunct 
matrices with 

m = 0(m'd 2 (log n)/(l - p) 2 ) 

rows and error tolerance 

e = VL(e'pd(logn)/(l-p)), 
where m! and e' are respectively the number of rows and error tolerance of 



any of the regular matrices obtained in Theorem 4.32 

We note that in all cases, the final dependence on the sparsity parameter 
d is, roughly, 0(d 3 ) which has an exponent independent of the threshold u. 



Table 4.3 summarizes the obtained parameters for the general case (with ar- 
bitrary gaps). We see that, when d is not negligibly small (e.g., d = n 1 ' 10 ), 
the bounds obtained by our explicit constructions are significantly better than 



those offered by strongly disjunct matrices (as in Table 4.2) 



4.3.3.3 The Case with Positive Gaps 

In preceding sections we have focused on the case where g = 0. However, we 
observe that all the techniques that we have developed so far can be extended 
to the positive-gap case in a straightforward way. The main observations are 
as follows. 
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Table 4.3: Summary of the parameters achieved by various threshold testing 
schemes. The noise parameter p £ [0,1) is arbitrary, and thresholds £, u = 
£ + g are fixed constants. "Exp" and "Rnd" respectively indicate explicit and 
randomized constructions. 



Number of rows 



Tolerable 
errors 



Remarks 












Rnd: Construction 



4.3 



0{d f J+ 3 



(\ogd)T2 logn - 
(l_p)9+2 



Constructions 4.5 and 4.2 combined 



Q(^ +3+/3 g^) 



assuming optimal condensers and 
strongly disjunct matrices. 
Exp (*) 

Exp (**) 



n(d 9+ ' 2 log d n + ed^ 1 ) I e 



Lower bound (see Section 4.3.3.3) 



(■*•) Constructions 4.5 and 4.2 combined using Theorem 4.19 and 120 , where 
T2 = exp(0(log a log n)) = quasipoly(logn). 



(irk) Constructions 4.5 and 4.2 combined using Theorem 2.22 and [120] , where 
f3 > is any arbitrary constant and T3 = ((logn)(logd)) 1+u '° = 
poly (log n, logd). 



Definition 4.26 can be adapted to allow more than a single distinguished 
column in disjunct matrices. In particular, in general we may require 
the matrix M to have more than e rows that it-satisfy every choice 
of a critical set S, a zero set Z, and any g + 1 designated columns 
DOS (at which all entries of the corresponding rows must be 1). 
Denote this generalized notion by (d,e;u,g)-disj\mct matrices. It is 



straightforward to extend the arguments of Lemma 4.27 to show that 
the generalized notion of (d, e; u, g)-disjunct matrices is necessary and 
sufficient to capture non-adaptive threshold group testing with upper 
threshold u and gap g. 



2. Lemma 4.30| can be generalized to show that Construction 4.3 (with 
probability 1— o(l)) results in a (d, Q u (pdlog(n/ d) / (1—p) 2 ); u, ^-disjunct 
matrix if the number of measurements is increased by a factor 0(d 9 ). 



3. Lemma 4.28 can be extended to positive gaps, by taking Mi as a (d — 
l,e\;£ — l)-regular matrix, provided that, for every y G M^M 1,0+1 and 
y' G M2[x']i i9 _|_i, we have |supp(y) \ supp(y')| > &i. In particular this 
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is the case if M2 is strongly (d, ei — l;g+ l)-disjunct 20 . Similarly for 



Lemma 4.29 M2 must be taken as a strongly (2d, e^; g-\- l)-disjunct ma- 



trix. Consequently, using the coding-theoretic construction of strongly 



disjunct matrices described in Section |4.3.2[ our explicit constructions 
of (d, e; u)-disjunct matrices can be extended to the gap model at the 
cost of a factor 0(d 9 ) increase in the number of measurements (as sum- 



marized in Table 4.3). 



4. Observe that a (d, e; u,g)-disjunct matrix is in particular, strongly (d — 
g,e;g + l)-disjunct and thus, the lower bound £l(d 9+2 log rf n + ed 9+l ) on 
the number of rows of strongly disjunct matrices applies to them as well. 

4.4 Notes 

The notion of (i-disjunct matrices is also known in certain equivalent forms; 
e.g., d- superimposed codes, d-separable matrices, or d-cover-free families (cf. 



50 



The special case of Definition 4.7 corresponding to (0, 0, e' , 0)-resilient 



matrices is related to the notion of selectors in 43 and resolvable matrices in 



56 . Lemma 4.10 is similar in spirit to the lower bound obtained in |43| for 



the size of selectors. 

The notion of strongly disjunct matrices, in its general form, has been 
studied in the literature under different names and equivalent formulations, 



e.g., superimposed (u, d)-designs/codes and (u,d) cover-free families (see 26 
[28}[53}[tlJ[l44j[l45] and the references therein). 



4. A Some Technical Details 

For a positive integer c > 1, define a c-hypergraph as a tuple (V, E), where V 
is the set of vertices and E is the set of hyperedges such that every e £ E is 
a subset of V of size c. The degree of a vertex v 6 V, denoted by deg(u), is 
the size of the set {e G E: v £ E}. Note that \E\ < ( lV J) and deg(u) < (j, V [). 

The density of the hypergraph is given by |22|/(' '). A vertex cover on the 
hypergraph is a subset of vertices that contains at least one vertex from every 
hyperedge. A matching is a set of pairwise disjoint hyperedges. It is well 
known that any dense hypergraph must have a large matching. Below we 
reconstruct a proof of this claim. 

Proposition 4.33. Let H be a c-hypergraph such that every vertex cover of 
H has size at least k. Then H has a matching of size at least k/c. 

Proof. Let M be a maximal matching of H, i.e., a matching that cannot be 
extended by adding further hyperedges. Let C be the set of all vertices that 



20 Here we are also considering the unavoidable assumption that 
max{|supp(x) \ supp(x')|, |supp(x') \ supp(a^)]} > g. 
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participate in hyperedges of M. Then C has to be a vertex cover, as otherwise 
one could add an uncovered hyperedge to M and violate maximality of M. 
Hence, c\M\ = \C\ >k, and the claim follows. □ 

Lemma 4.34. Let H = (V, E) be a c-hypergraph with density at least e > 0. 
Then H has a matching of size at least -^(|V|— c+1). 

Proof. For every subset S C V of size c, denote by 1(5") the indicator value 
of S being in E. Let C be any vertex cover of H. Denote by S the set of all 
subsets of V of size c. Then we have 



\V\ 



<£i(S) <E de eW ^\ C \(Z\ 

ses vec ^ c 



Hence, \C\ > e(n — c + l)/c, and the claim follows using Proposition 4.33. □ 



Andantino 




j m m j. j ^ 



f ff F I F gj Mr ^ j'N i'j j 



Frederic Chopin (1810-1849): Ballade Op. 38 No. 2 in F major. 



"How is an error possible in 
mathematics?" 

— Henri Poincare 



Chapter 5 



Capacity Achieving Codes 



One of the basic goals of coding theory is coming up with efficient construc- 
tions of error-correcting codes that allow reliable transmission of information 
over discrete communication channels. Already in the seminal work of Shan- 
non 136 , the notion of channel capacity was introduced which is a charac- 
teristic of the communication channel that determines the maximum rate at 
which reliable transmission of information (i.e., with vanishing error proba- 
bility) is possible. However, Shannon's result did not focus on the feasibility 
of the underlying code and mainly concerned with the existence of reliable, 
albeit possibly complex, coding schemes. Here feasibility can refer to a com- 
bination of several criteria, including: succinct description of the code and its 
efficient computability, the existence of an efficient encoder and an efficient 
decoder, the error probability, and the set of message lengths for which the 
code is defined. 

Besides heuristic attempts, there is a large body of rigorous work in the 
literature on coding theory with the aim of designing feasible capacity ap- 
proaching codes for various discrete channels, most notably, the natural and 
fundamental cases of the binary erasure channel (BEC) and binary symmetric 
channel (BSC). Some notable examples in "modern coding" include Turbo 
codes and sparse graph codes (e.g., LDPC codes and Fountain codes, cf. 
13,125, 137|). These classes of codes are either known or strongly believed to 



contain capacity achieving ensembles for the erasure and symmetric channels. 
While such codes are very appealing both theoretically and practically, and 
are in particular designed with efficient decoding in mind, in this area there 
still is a considerable gap between what we can prove and what is evidenced by 
practical results, mainly due to complex combinatorial structure of the code 
constructions. Moreover, almost all known code constructions in this area 
involve a considerable amount of randomness, which makes them prone to a 
possibility of design failure (e.g., choosing an "unfortunate" degree sequence 



126 
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for an LDPC code). While the chance of such possibilities is typically small, in 
general there is no known efficient way to certify whether a particular outcome 
of the code construction is satisfactory. Thus, it is desirable to come up with 
constructions of provably capacity achieving code families that are explicit, 
i.e., are efficient and do not involve any randomness. 

Explicit construction of capacity achieving codes was considered as early 
as the classic work of Forney |60], who showed that concatenated codes can 
achieve the capacity of various memoryless channels. In this construction, an 
outer MDS code is concatenated with an inner code with small block length 
that can be found in reasonable time by brute force search. An important 
subsequent work by Justesen [87J (that was originally aimed for explicit con- 
struction of asymptotically good codes) shows that it is possible to eliminate 
the brute force search by varying the inner code used for encoding different 
symbols of the outer encoding, provided that the ensemble of inner codes 
contains a large fraction of capacity achieving codes. 

Recently, Arikan |7| gave a framework for deterministic construction of 
capacity achieving codes for discrete memoryless channels (DMCs) with bi- 
nary input that are equipped with efficient encoders and decoders and attain 
slightly worse than exponentially small error probability. These codes are de- 
fined for every block length that is a power of two, which might be considered 
a restrictive requirement. Moreover, the construction is currently explicit (in 
the sense of polynomial-time computability of the code description) only for 
the special case of BEC and requires exponential time otherwise. 

In this chapter, we revisit the concatenation scheme of Justesen and give 
new constructions of the underlying ensemble of the inner codes. The code 
ensemble used in Justesen's original construction is attributed to Wozencraft. 
Other ensembles that are known to be useful in this scheme include the en- 



semble of Goppa codes and shortened cyclic codes (see 127 , Chapter 12). 
The number of codes in these ensembles is exponential in the block length 
and they achieve exponentially small error probability. These ensembles are 
also known to achieve the Gilbert- Varshamov bound, and owe their capacity 
achieving properties to the property that each nonzero vector belongs to a 
small number of the codes in the ensemble. 

Here, we will use extractors and lossless condensers to construct much 
smaller ensembles with similar, random-like, properties. The quality of the 
underlying extractor or condenser determines the quality of the resulting code 
ensemble. In particular, the size of the code ensemble, the decoding error and 
proximity to the channel capacity are determined by the seed length, the error, 
and the output length of the extractor or condenser being used. 

As a concrete example, we will instantiate our construction with appro- 
priate choices of the underlying condenser (or extractor) and obtain, for every 
block length n, a capacity achieving ensemble of size 2 n that attains expo- 
nentially small error probability for both erasure and symmetric channels (as 
well as the broader range of channels described above), and an ensemble of 
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quasipolynomial 1 size 2°( log n > that attains the capacity of BEC. Using nearly 
optimal extractors and condensers that require logarithmic seed lengths, it is 
possible to obtain polynomially small capacity achieving ensembles for any 
block length. 

Finally, we apply our constructions to Justesen's concatenation scheme to 
obtain an explicit construction of capacity-achieving codes for both BEC and 
BSC that attain exponentially small error, as in the original construction of 
Forney. Moreover, the running time of the encoder is almost linear in the block 
length, and decoding takes almost linear time for BEC and almost quadratic 
time for BSC. Using our quasipolynomial-sized ensemble as the inner code, we 
are able to construct a fully explicit code for BEC that is defined and capacity 
achieving for every choice of the message length. 

5.1 Discrete Communication Channels 

A discrete communication channel is a randomized process that takes a po- 
tentially infinite stream of symbols Xq,X\, . . . from an input alphabet £ and 
outputs an infinite stream Yq,Y±, . . . from an output alphabet T. The indices 
intuitively represent the time, and each output symbol is only determined 
from what channel has observed in the past. More precisely, given Xq, . . . , Xt, 
the output symbol Yt must be independent of Xt+i,Xt+2> ■ ■■■ Here we will 
concentrate on finite input and finite output channels, that is, the alphabets 
S and r are finite. In this case, the conditional distribution p(Yt\Xt) of each 
output symbol Yt given the input symbol Xt can be written as a stochastic 
|S| x |r| transition matrix, where each row is a probability distribution. 

Of particular interest is a memoryless channel, which is intuitively "oblivi- 
ous" of the past. In this case, the transition matrix is independent of the time 
instance. That is, we have p(Y~t\Xt) = p(Yq\Xq) for every t. When the rows 
of the transition matrix are permutations of one another and so is the case 
for the columns, the channel is called symmetric. For example, the channel 
defined by 

/0.4 0.1 0.5 N 
p(Y\X) = 0.5 0.4 0.1 

\0.1 0.5 0.4, 

is symmetric. Intuitively, a symmetric channel does not "read" the input 
sequence. An important class of symmetric channels is defined by additive 
noise. In an additive noise channel, the input and output alphabets are the 
same finite field W q and each output symbol Yt is obtained from Xt using 

Y t = X t + Z t , 



l A quantity f(n) is said to be quasipolynomial in n (denoted by f(n) — quasipoly(n)) if 

/(n)=2( lo s") 0<1) . 
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where the addition is over F g and the channel noise Z% £ F g is chosen inde- 
pendently of the input sequence 2 . Typically Z t is also independent of time t, 
in which case we get a memoryless additive noise channel. For a noise distri- 
bution Z, we denote the memoryless additive noise channel over the input (as 
well as output) alphabet S by SC(T,,Z). 

Note that the notion of additive noise channels can be extended to the 
case where the input and alphabet sets are vector spaces F™, and the noise 
distribution is a probability distribution over F™. By considering an isomor- 
phism between F™ and the field extension ¥ q n , such a channel is essentially an 
additive noise channel SC(F „n,Z), where Z is a noise distribution over F g n. 
On the other hand, the channel SC(F g n,^) can be regarded as a "block- wise 
memoryless" channel over the alphabet W q . Namely, in a natural way, each 
channel use over the alphabet F g n can be regarded as n subsequent uses of a 
channel over the alphabet F g . When regarding the channel over F g , it does 
not necessarily remain memoryless since the additive noise distribution Z can 
be an arbitrary distribution over F g ™ and is not necessarily expressible as a 
product distribution over F 9 . However, the noise distribution of blocks of n 
subsequent channel uses are independent from one another and form a product 
distribution (since the original channel SC(F q n,Z) is memoryless over F g w). 
Often by choosing larger and larger values of n and letting n grow to infinity, it 
is possible to obtain good approximations of a non-memoryless additive noise 
channel using memoryless additive noise channels over large alphabets. 

An important additive noise channel is the q-ary symmetric channel, which 
is defined by a (typically small) noise parameter p £ [0, 1). For this channel, 
the noise distribution Z has a probability mass 1-pon zero, and p/(q — 1) 
on every nonzero alphabet letter. A fundamental special case is the binary 
symmetric channel (BSC), which corresponds to the case q = 2 and is denoted 
by BSC(p). 

Another fundamentally important channel is the binary erasure channel. 
The input alphabet for this channel is {0, 1} and the output alphabet is the set 
{0, 1, ?}. The transition is characterized by an erasure probability p G [0, 1). 
A transmitted symbol is output intact by the channel with probability 1 — p. 
However, with probability p, a special erasure symbol "?" is delivered by the 
channel. The behavior of the binary symmetric channel BSC(p) and binary 



erasure channel BEC(p) is schematically described by Figure 5.1 

A channel encoder £ for a channel C with input alphabet £ and output 
alphabet T is a mapping C: {0, 1} — > S n . A channel decoder, on the other 
hand, is a mapping T>: T n — > {0, l} k . A channel encoder and a channel de- 
coder collectively describe a channel code. Note that the image of the encoder 
mapping defines a block code of length n over the alphabet S. The parameter 



2 In fact, since we are only using the additive structure of W q , it can be replaced by any 
additive group, and in particular, the ring Z/qZ for an arbitrary integer q > 1. This way, q 
does not need to be restricted to a prime power. 
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1 — p 



1 — p 




1 — p 



» 



* 1 



Figure 5.1: The binary symmetric channel (left) and binary erasure channel 
(right). On each graph, the left part corresponds to the input alphabet and 
the right part to the output alphabet. Conditional probability of each output 
symbol given an input symbol is shown by the labels on the corresponding 



arrows. 



n defines the block length of the code. For a sequence Y e S n , denote by the 
random variable 6(1") a sequence Y £T n that is output by the channel, given 
the input Y. 

Intuitively, a channel encoder adds sufficient redundancy to a given "mes- 
sage" X £ {0, 1} (that is without loss of generality modeled as a binary 
string of length k) , resulting in an encoded sequence Y £ S n that can be fed 
into the channel. The channel manipulates the encoded sequence and delivers 
a sequence Y £ T n to a recipient whose aim is to recover X. The recovery 
process is done by applying the channel decoder on the received sequence Y . 
The transmission is successful when T>(Y) = X. Since the channel behavior 
is not deterministic, there might be a nonzero probability, known as the error 
probability, that the transmission is unsuccessful. More precisely, the error 
probability of a channel code is defined as 



p e := sup Pv[V(G(S(X))) ^ X], 

xe{o,i} fc 

where the probability is taken over the randomness of C. A schematic diagram 
of a simple communication system consisting of an encoder, point-to-point 



channel, and decoder is shown in Figure 5.2 



X = (X 1 ,...,X k ) 



Channel 
Encoder 



redundant encoding 

► 



Y={Y 1 , 



,Y n ) 



Channel 
p(Y\Y) 



received sequence 



Y = (Y l ,...,Y„) 



Channel 
Decoder 



X = (X v 



Figure 5.2: The schematic diagram of a point-to-point communication sys- 
tem. The stochastic behavior of the channel is captured by the conditional 
probability distribution p(Y\Y). 
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For linear codes over additive noise channels, it is often convenient to 
work with syndrome decoders. Consider a linear code with generator and 
parity check matrices G and H, respectively. The encoding of a message 
x (considered as a row vector) can thus be written as xG. Suppose that 
the encoded sequence is transmitted over an additive noise channel, which 
produces a noisy sequence y := xG + z, for a randomly chosen z according to 
the channel distribution. The receiver receives the sequence y and, without 
loss of generality, the decoder's task is to obtain an estimate of the noise 
realization z from y. Now, observe that 

Hy T = HG T x T + Hz T = Hz T , 

where the last equality is due to the orthogonality of the generator and par- 
ity check matrices. Therefore, Hz is available to the decoder and thus, in 
order to decode the received sequence, it suffices to obtain an estimate of the 
noise sequence z from the syndrome Hz T . A syndrome decoder is a function 
that, given the syndrome, outputs an estimate of the noise sequence (note 
that this is independent of the codeword being sent). The error probability of 
a syndrome decoder can be simply defined as the probability (over the noise 
randomness) that it obtains an incorrect estimate of the noise sequence. Ob- 
viously, the error probability of a syndrome decoder upper bounds the error 
probability of the channel code. 

The rate of a channel code (in bits per channel use) is defined as the 
quantity k/n. We call a rate r > feasible if for every e > 0, there is a 
channel code with rate r and error probability at most e. The rate of a channel 
code describes its efficiency; the larger the rate, the more information can 
be transmitted through the channel in a given "time frame" . A fundamental 
question is, given a channel C, to find the largest possible rate at which reliable 



transmission is possible. In his fundamental work, Shannon 136 introduced 
the notion of channel capacity that answers this question. Shannon capacity 
can be defined using purely information-theoretic terminology. However, for 
the purposes of this chapter, it is more convenient to use the following, more 
"computational", definition which turns out to be equivalent to the original 
notion of Shannon capacity: 

Cap(C) := sup{r | r is a feasible rate for the channel C}. 

Capacity of memoryless symmetric channels has a particularly nice form. 
Let Z denote the probability distribution defined by any of the rows of the 
transition matrix of a memoryless symmetric channel C with output alphabet 
r. Then, capacity of C is given by 

Ca P (e)=to g2 |r|-H(z), 

where H(-) denotes the Shannon entropy [401 Section 7.2]. In particular, 
capacity of the binary symmetric channel BSC(p) (in bits per channel use) is 
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equal to 

1 -h(p) = l+plog 2 p + (1 -p)log 2 (l -p). 

Capacity of the binary erasure channel BEC(p) is moreover known to be 1 — p 



40| Section 7.1]. 

A family of channel codes of rate r is an infinite set of channel codes, such 
that for every (typically small) rate loss 5 £ (0, r) and block length n, the 
family contains a code C(n, 5) of length at least n and rate at least r — 5. The 
family is called explicit if there is a deterministic algorithm that, given n and 5 
as parameters, computes the encoder function of the code C(n, 5) in polynomial 
time in n. For linear channel codes, this is equivalent to computing a generator 
or parity check matrix of the code in polynomial time. If, additionally, the 
algorithm receives an auxiliary index i £ [s] , for a size parameter s depending 
on n and 5, we instead get an ensemble of size s of codes. An ensemble can 
be interpreted as a set of codes of length n and rate at least r — 5 each, that 
contains a code for each possibility of the index i. 

We call a family of codes capacity achieving for a channel C if the family is 
of rate Cap(S) and moreover, the code C(n, 5) as described above can be chosen 
to have an arbitrarily small error probability for the channel C. If the error 
probability decays exponentially with the block length n; i.e., p e = 0(2~' yn ), 
for a constant 7 > (possibly depending on the rate loss), then the family is 
said to achieve an error exponent 7. We call the family capacity achieving for 
all lengths if it is capacity achieving and moreover, there is an integer constant 
no (depending only on the rate loss 6) such that for every n > uq, the code 
C(n, 5) can be chosen to have length exactly n. 

5.2 Codes for the Binary Erasure Channel 

Any code with minimum distance d can tolerate up to d — 1 erasures in the 
worst case 5 . Thus one way to ensure reliable communication over BEC(p) 
is to use binary codes with relative minimum distance of about p. However, 
known negative bounds on the rate-distance trade-off (e.g., the sphere packing 
and MRRW bounds) do not allow the rate of such codes to approach the 
capacity 1 — p. However, by imposing the weaker requirement that most of 
the erasure patterns should be recoverable, it is possible to attain the capacity 
with a positive, but arbitrarily small, error probability (as guaranteed by the 
definition of capacity). 

In this section, we consider a different relaxation that preserves the worst- 
case guarantee on the erasure patterns; namely we consider ensembles of linear 
codes with the property that any pattern of up to p erasures must be tolerable 
by all but a negligible fraction of the codes in the ensemble. This in particular 
allows us to construct ensembles in which all but a negligible fraction of the 



3 See Appendix \A\ for a quick review of the basic notions in coding theory. 
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codes are capacity achieving for BEC. Note that as we are only considering 
linear codes, recoverability from a particular erasure pattern S C [n] (where n 
is the block length) is a property of the code and independent of the encoded 
sequence. 

Now we introduce two constructions, which employ strong, linear extrac- 
tors and lossless condensers as their main ingredients. Throughout this section 
we denote by /: Fg X Ffj — > F£ a strong, linear, lossless condenser for min- 
entropy m and error e and by g: FJ? xF^ — > Fg a strong, linear extractor 
for min-entropy n — m and error e'. We assume that the errors e and e' are 
substantially small. Using this notation, we define the ensembles J- and Q as 
in Construction [5TTI 

Obviously, the rate of each code in J- is at least 1 — r/n. Moreover, as g is 
a strong extractor we can assume without loss of generality that the rank of 



each G u is exactly 4 k. Thus, each code in Q has rate k/n. Lemma 5.2 below 
is our main tool in quantifying the erasure decoding capabilities of the two 
ensembles. Before stating the lemma, we mention a proposition showing that 
linear condensers applied on affine sources achieve either zero or large errors: 

Proposition 5.1. Suppose that a distribution X is uniformly supported on an 
affine k-dimensional subspace over F™. Consider a linear function f : F™ — > 
F™, and define the distribution y as y := f{X). Suppose that, for some 
integer k and e < 1/2, y is e-close to having either min-entropy to log g or at 
least klogq. Then, e = 0. 

Proof. By linearity, y is uniformly supported on an affine subspace A of F™ . 
Let k! < m be the dimension of this subspace, and observe that k' < k. 



4 This causes no loss of generality since, if the rank of some G u is not maximal, one of 
the k symbols output by the linear function g(-,u) would linearly depend on the others and 
thus, the function would fail to be an extractor for any source (so one can arbitrarily modify 
g(-,u) to have rank k without negatively affecting the parameters of the extractor g). 



Ensemble T: Define a code C u for each seed u G F5j as follows: Let H u 
denote the r x n matrix that defines the linear function /(-,«), i.e., 
for each x G F2 , H u ■ x = f(x, u). Then H u is a parity check matrix 
for C u . 

Ensemble Q: Define a code C' u for each seed mSFj as follows: Let G u 
denote the kxn matrix that defines the linear function g{-,u). Then 
G u is a generator matrix for C' u . 



Construction 5.1: Ensembles J- and Q of error-correcting codes. 
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First, suppose that y is e-close to a distribution with min-entropy mlogq; 
i.e., the uniform distribution on F™. Now, the statistical distance between y 
and the uniform distribution is, by definition, 

X>- fe '-<r m ) = i-g fe '- m -i- 

Since e < 1/4, q > 2, and A;' and m are integers, this implies that the distance 
is greater than 1/2 (a contradiction) unless k' = m, in which case it becomes 
zero. Therefore, the output distribution is exactly uniform over F™. 

Now consider the case where y is e-close to having min-entropy at least 
klogq. Considering that k' < k, the definition of statistical distance implies 
that e is at least 

J2(q- k '-q- k ) = l-q k '- k . 
xeA 

Similarly as before, we get that k' = k, meaning that y is precisely a distri- 
bution with min-entropy k log q. □ 

Lemma 5.2. Let S C [n] be a set of size at most m. Then all but a 5e 
fraction of the codes in T and all but a 5e' fraction of those in Q can tolerate 
the erasure pattern defined by S. 

Proof. We prove the result for the ensemble Q. The argument for T is similar. 
Consider a probability distribution S on F£ that is uniform on the coordinates 
specified by S := [n] \ S and fixed to zeros elsewhere. Thus the min-entropy 
of S is at least n — m, and the distribution (U,g(S, U)), where U ~ Ud', is 

e'-close to Ud'+k- 

By Corollary 2.13, for all but a 5e' fraction of the choices of u G Fj , the 



distribution of g(S,u) is (l/5)-close to Uk- Fix such a u. By Proposition 5.1 
the distribution of g(S, u) must in fact be exactly uniform. Thus, the k X m 
submatrix of G u consisting of the columns picked by S must have rank k, 
which implies that for every x £ F|, the projection of the encoding x • G u to 
the coordinates chosen by S uniquely identifies x. □ 

The lemma combined with a counting argument implies the following corol- 
lary: 

Corollary 5.3. Let S be any distribution on the subsets of[n] of size at most 
m. Then all but a \/5e (resp., v5V) fraction of the codes in T (resp., Q) can 
tolerate erasure patterns sampled from S with probability at least 1 — \/5e 
(resp., 1 — voVJ. □ 

Note that the result holds irrespective of the distribution S, contrary to 
the familiar case of BEC(p) for which the erasure pattern is an i.i.d. (i.e., 
independent and identically-distributed) sequence. For the case of BEC(p), 
the erasure pattern (regarded as its binary characteristic vector in Fg) is given 
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by S := (Si, . . . , S n ), where the random variables Si,...,S n £ F2 are i.i.d. 
and Pr[Si = 1] = p. We denote this particular distribution by £> n ,pj which 
assigns a nonzero probability to every vector in F^. Thus in this case we 



cannot directly apply Corollary 5.3 However, note that B n ,p can be written 
as a convex combination 

(5.1) B n>p = (l-j)U n ,< p/ +iV, 

for p' := p + Q(l) that is arbitrarily close to p, where T> is an "error distri- 
bution" whose contribution 7 is exponentially small. The distribution U n ^ p ' 
is the distribution B n ,p conditioned on vectors of weight at most np' . Corol- 



lary 5.3 applies to U n ,<p' by setting m = np' . Moreover, by the convex com- 
bination above, the erasure decoding error probability of any code for erasure 
pattern distributions B n ^ p and U n ^< p > differ by no more than 7. Therefore, the 
above result applied to the erasure distribution U n ^< p i handles the particular 
case of BEC(p) with essentially no change in the error probability. 



In light of Corollary 5.3, in order to obtain rates arbitrarily close to the 
channel capacity, the output lengths of / and g must be sufficiently close to the 
entropy requirement m. More precisely, it suffices to have r < (1 + a)m and 
k > (1 — a)m for arbitrarily small constant a > 0. The seed length of / and g 
determine the size of the code ensemble. Moreover, the error of the extractor 
and condenser determine the erasure error probability of the resulting code 
ensemble. As achieving the channel capacity is the most important concern 
for us, we will need to instantiate / (resp., g) with a linear, strong, lossless 
condenser (resp., extractor) whose output length is close to m. We mention 
one such instantiation for each function. 

For both functions / and g, we can use the explicit extractor and loss- 



less condenser obtained from the Leftover Hash Lemma (Lemma 2.17), which 
is optimal in the output length, but requires a large seed, namely, d = n. 
The ensemble resulting this way will thus have size 2 n , but attains a posi- 
tive error exponent 5/2 for an arbitrary rate loss 5 > 0. Using an optimal 
lossless condenser or extractor with seed length d = log(n) + 0(log(l/e)) and 
output length close torn, it is possible to obtain a polynomially small capacity- 
achieving ensemble. However, in order to obtain an explicit ensemble of codes, 
the condenser of extractor being used must be explicit as well. 

In the world of linear extractors, we can use Trevisan's extractor (The- 
orem 2.20) to improve the size of the ensemble compared to what obtained 



from the Leftover Hash Lemma. In particular, Trevisan's extractor combined 



with Corollary 5.3 (using ensemble Q) immediately gives the following result: 



Corollary 5.4. Let p,c > be arbitrary constants. Then for every integer 
n > 0, there is an explicit ensemble Q of linear codes of rate 1 — p — o(l) 
such that, the size ofQ is quasipolynomial, i.e., \Q\ = 2°^ c log n >, and, all but 
an n~ c = o(l) fraction of the codes in the ensemble have error probability at 
most n~ c when used over BEC(p). □ 
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For the ensemble J 7 , on the other hand, we can use the linear lossless 
condenser of Guruswami et al. that only requires a logarithmic seed (Corollary 



2.23). Using this condenser combined with Corollary 5.3, we can strengthen 



the above result as follows: 

Corollary 5.5. Let p,c,a>0 be arbitrary constants. Then for every integer 
n > 0, there is an explicit ensemble T of linear codes of rate 1 — p — a such 
that \Q\ = 0(n c ) for a constant d only depending on c,a. Moreover, all but 
an n~ c = o(l) fraction of the codes in the ensemble have error probability at 
most n~ c when used over BEC(p). □ 

5.3 Codes for the Binary Symmetric Channel 

The goal of this section is to design capacity achieving code ensembles for 
the binary symmetric channel BSC(p). In order to do so, we obtain codes for 
the general (and not necessarily memoryless) class SQ{F q ,Z) of symmetric 
channels, where Z is any flat distribution or sufficiently close to one. For 
concreteness, we will focus on the binary case where q = 2. 

Recall that the capacity of BSC(Z), seen as a binary channel, is 1 — h(Z) 
where h(Z) is the entropy rate of Z. The special case BSC(p) is obtained by 
setting Z = B n ^ p ; i.e., the product distribution of n Bernoulli random variables 
with probability p of being equal to 1 . 

The code ensemble that we use for the symmetric channel is the ensemble 
J 7 , obtained from linear lossless condensers, that we introduced in the pre- 
ceding section. Thus, we adopt the notation (and parameters) that we used 
before for defining the ensemble J- '. Recall that each code in the ensemble has 
rate at least 1 — r/n. In order to show that the ensemble is capacity achieving, 
we consider the following brute- force decoder for each code: 

Brute-force decoder for code C u : Given a received word y G FrJ , 
find a codeword y S FJ? of C u used and a vector z G supp(iJ) such 
that y = y + z. Output y, or an arbitrary codeword if no such 
pair is found. If there is more than one choice for the codeword y, 
arbitrarily choose one of them. 

For each a £ Fj, denote by £(C U ,Z) the error probability of the above 
decoder for code C u over BSC(F2,i?). The following lemma quantifies this 
probability: 

Lemma 5.6. Let Z be a Eat distribution with entropy m. Then for at least 
a 1 — 2y/e fraction of the choices of ' u G Fj, we have £(C U , Z) < y^. 

Proof. The proof is straightforward from the almost-injectivity property of 



lossless condensers discussed in Section 2.2.2 We will use this property to 
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construct a syndrome decoder for the code ensemble that achieves a sufficiently 
small error probability. 

By Corollary 2.13, for a 1 — 2-^/e fraction of the choices of u G {0, l} d , the 
distribution y := f(Z,u) is (y^/ 2) -close to having min-entropy at least m. 
Fix any such u. We show that the error probability £(C U ,Z) is bounded by 

For each j/GFj, define 

N{y) := \{x G supp(Z): f(x,u) = y}\ 

and recall that f(x, u) = iJ„ • x. Now suppose that a message is encoded using 
the code C u to an encoding x G C u , and that x is transmitted through the 
channel. The error probability £(C U ,Z) can be written as 



£(C U ,Z) 

(5.2) 
(5.3) 



= Pr [3x' G C u , 3z' G supp(.2) \ z : a; + z = x' + z'\ 

< Pr [3a/ G C u , 3z G supp(Z) \ z: H u ■ (x + z) = H u ■ (x' + z) 
z~Z 

= Pr [3z' G supp(^) \z: H u -z = H u -z'\ 
= Pr [A/"(-ff • 2) > 1] 
= PiMf(x,u))>l], 

z^Z 



where (5.2) uses the fact that any codeword of C u is in the right kernel of H u . 



By the first part of Proposition 2.14, there is a set T C F£ of size at least 
(1 — - v /i)|supp(Z)| such that, M{y) = 1 for every y G T. Since Z is uniformly 



distributed on its support, this combined with (5.3) immediately implies that 
£(C U ,Z)<^. □ 

The lemma implies that any linear lossless condenser with entropy require- 
ment m can be used to construct an ensemble of codes such that all but a 
small fraction of the codes are good for reliable transmission over BSC(iT), 
where Z is an arbitrary flat distribution with entropy at most m. Similar to 
the case of BEC, the seed length determines the size of the ensemble, the error 
of the condenser bounds the error probability of the decoder, and the output 
length determines the proximity of the rate to the capacity of the channel. 



Again, using the condenser given by the Leftover Hash Lemma (Lemma 2.17), 
we can obtain a capacity achieving ensemble of size 2 n . Moreover, using the 



linear lossless condenser of Guruswami et al. (Corollary 2.23) the ensemble 



can be made polynomially small (similar to the result given by Corollary 5.5). 



It is not hard to see that the converse of the above result is also true; 
namely, that any ensemble of linear codes that is universally capacity achieving 
with respect to any choice of the noise distribution Z defines a strong linear, 
lossless, condenser. This is spelled out in the lemma below. 
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Lemma 5.7. Let {C\, . . . ,Ct} be a binary code ensemble of length n and 
dimension n — r such that for every Eat distribution Z with min-entropy at 
most m on F^ ; all but a 7 fraction of the codes in the ensemble (for some 
7 G [0,1)) achieve error probability at most e (under syndrome decoding) 
when used over SC(W q n, Z). Then the function f : Fg x [T] — )• F£ defined as 

f(x,u) := H u -x, 

where H u is a parity check matrix of C u , is a strong, lossless, (m, 2e + 7)- 
condenser. 



Proof. The proof is straightforward using similar arguments as in Lemma 5.6 



Without loss of generality (by Proposition 2.8), let Z be a flat distribution 
with min-entropy m, and denote by D: F2 — > Fj the corresponding syndrome 
decoder. Moreover, without loss of generality we have taken the decoder to be 
a deterministic function. For a randomized decoder, one can fix the internal 
coin flips so as to preserve the upper bound on its error probability. Now let u 
be chosen such that C u achieves an error probability at most e (we know this 
is the case for at least jT of the choices of u) . 

Denote by T C supp(iJ) the set of noise realizations that can potentially 
confuse the syndrome decoder. Namely, 

T :={z£ supp(Z): 3z' Esupp(^),z' / z,H u ■ z = H u ■ z'}. 

Note that, for a random Z ~ Z, conditioned on the event that Z £ T, the 
probability that the syndrome decoder errs on Z is at least 1/2, since we know 
that Z can be confused by at least one different noise realization. We can write 
this more precisely as 

Pt[D(Z) ^Z\ZGT]> 1/2. 

Since the error probability of the decoder is upper bounded by e, we conclude 
that 

Pr \Z G T] < 2e. 

Therefore, the fraction of the elements on support of Z that collide with some 
other element under the mapping defined by H u is at most 2e. Namely, 

\{H u -z:z£supp(Z)}\>2 m (l-2e), 

and this is true for at least 1 — 7 fraction of the choices of u. Thus, for a 
uniformly random U G [T] and Z ~ Z, the distribution of (U, Hu ■ Z) has a 
support of size at least 

(1 - 7)(1 - 2e)T2 m > (1 - 7 - 2e)T2 m . 



By the second part of Proposition 2,14[ we conclude that this distribution is 



(2e + 7)-close to having entropy m + log T and thus, the function / defined in 
the statement is a strong lossless (m, 2e + 7)-condenser. □ 
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By this lemma, the known lower bounds on the seed length and the output 
length of lossless condensers that we discussed in Chapter [2]translate into lower 
bounds on the size of the code ensemble and proximity to the capacity that 
can be obtained from our framework. In particular, in order to get a positive 
error exponent (i.e., exponentially small error in the block length), the size of 
the ensemble must be exponentially large. 

It is worthwhile to point out that the code ensembles F and Q discussed 
in this and the preceding section preserve their erasure and error correcting 
properties under any change of basis in the ambient space FrJ, due to the 
fact that a change of basis applied on any linear condenser results in a linear 
condenser with the same parameters. This is a property achieved by the 
trivial, but large, ensemble of codes defined by the set of all r x n parity check 
matrices. Observe that no single code can be universal in this sense, and it is 
inevitable to have a sufficiently large ensemble to attain this property. 

The Case BSC(p) 

For the special case of BSC(p), the noise distribution B n , p is not a flat distri- 
bution. Fortunately, similar to the BEC case, we can again use convex com- 
binations to show that the result obtained in Lemma 15.61 can be extended to 
this important noise distribution. The main tool that we need is an extension 



of Lemma 5.6 to convex combinations with a small number of components. 

Suppose that the noise distribution Z is not a flat distribution but can be 
written as a convex combination 

(5.4) Z = a x Z x + ■■■ + a t Z t . 

of t flat distributions, where the number t of summands is not too large, and 

|supp(Zi)| > |supp(Z 2 )| > ••• > |supp(Z t )|. 

For this more general case, we need to slightly tune our brute-force decoder 
in the way it handles ties. In particular, we now require the decoder to find 
a codeword y G C u and a potential noise vector z G supp(Z) that add up to 
the received word, as before. However, in case more than one matching pair 
is found, we will require the decoder to choose the one whose noise vector 
z belongs to the component Z\,...,Zt with smallest support (i.e., largest 
index). If the noise vector z G supp(Zi) that maximizes the index i is still not 
unique, the decoder can arbitrarily choose one. Under these conventions, we 
can now prove the following: 



Lemma 5.8. Suppose that a noise distribution Z is as in (5.4), where each 
component Z{ has entropy at most m, and the function f defining the ensemble 
J- is a strong lossless (< m + 1, e)-condenser. Then for at least a l — t(t+ r)yfe 
fraction of the choices of u G FJJ, the brute-force decoder satisfies £(C U ,Z) < 
2ty/e. 
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Proof. For each 1 < i < j < t, we define a flat distribution Zij that is 
uniformly supported on supp(iJj) U supp(Zj). Observe that each Z^ has min- 
entropy at most m + 1 and thus the function / is a lossless condenser with 
error at most e for this source. By Corollary 2.13| and a union bound, for a 



1 — t{t + 1) \/e fraction of the choices of u E {0, l} d , all t(t + l)/2 distributions 

f(Z ijjU ): l<i<j<t 

are simultaneously (y / i/2)-close to having min-entropy at least m. Fix any 
such u. 

Consider a random variable Z, representing the channel noise, that is 
sampled from Z as follows: First choose an index / E [t] randomly according 
to the distribution induced by (ai, . . . , a*) over the indices, and then sample 



a random noise Z ~ Zj. Using the same line of reasoning leading to (5.2) 



in the proof of Lemma 5.6, the error probability with respect to the code C u 
(i.e., the probability that the tuned distance decoder gives a wrong estimate 
on the noise realization Z) can now be bounded as 

£(C U , Z) < Pr[3i€ {/,..., t},3z' esupp{Zi)\Z: f(Z,u) = f(z',u)]. 

For i = 1, . . . ,t, denote by £{ the right hand side probability in the above 
bound conditioned on the event that I = i. Fix any choice of the index i. 
Now it suffices to obtain an upper bound on £{ irrespective of the choice of i, 
since 

£{C U ,Z) < y^^ajSj. 
ie[t] 

We call a noise realization z E supp(i?.;) confusable if 

3j > i,3z' E supp(^) \z: f(z,u) = f(z',u). 

That is, a noise realization is confusable if it can potentially cause the brute- 
force decoder to compute a wrong noise estimate. Our goal is to obtain an 
upper bound on the fraction of vectors on supp(Zi) that are confusable. 

For each j > i, we know that f(Zij,u) is (y / e/2)-close to having min- 



entropy at least m. Therefore, by the first part of Proposition 2.14 the set of 
confusable elements 

{z E supp(Zi): 3z' E supp(Zj) \ z such that f(z,u) = f(z ,u)} 

has size at most - v /e|supp(Zjj)| < 2- v /e|supp(Zj)| (using the fact that, since 
j > i, the support of Zj is no larger than that of Zj). By a union bound on 
the choices of j, we see that the fraction of confusable elements on supp(iJj) 
is at most 2ty/e. Therefore, £j < 2ty/e and we get the desired upper bound on 
the error probability of the brute-force decoder. □ 
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The result obtained by Lemma 5.8 can be applied to the channel BSC(p) 
by observing that the noise distribution B n ^ p can be written as a convex com- 
bination 

n(p+r]) 

Bn,p = ^2 aUn,i + 'YD, 

i=n(p—r]) 

where lA n ^ denotes the flat distribution supported on binary vectors of length 
n and Hamming weight exactly i, and T> is the distribution B n>p conditioned on 
the vectors whose Hamming weights lie outside the range [n(p — 77), n(p + rj)}. 
The parameter 77 > can be chosen as an arbitrarily small real number, so 
that the min-entropies of the distributions U n> i become arbitrarily close to the 
Shannon entropy of B n ^ p ; namely, nh(p). This can be seen by the estimate 

n J _ nnh(w/n)±o(n) 

wj 

h(-) being the binary entropy function, that is easily derived from Stirling's 
formula. By Chernoff bounds, the error 7 can be upper bounded as 



7 = Pr [|wgt(Z) — np\ > 7777] < 2e 



-CrjUp r\ — Cl(n) 



where c^ > is a constant only depending on 77, and is thus exponentially 
small. Thus the error probability attained by any code under noise distribu- 
tions Bn tP and Z := X/-=w»- 1 a i^n,i differ by the exponentially small quan- 



=n(p-»7) 



tity 7. We may now apply Lemma 5.8 on the noise distribution Z to attain 



code ensembles for the binary symmetric channel BSC(p). The error prob- 
ability of the ensemble is at most 2n^/e, and this bound is satisfied by at 
least a 1 — n 2 y/e fraction of the codes. Finally, the code ensemble is capacity 
achieving for BSC(p) provided that the condenser / attains an output length 
r < (1 + a)(p + 77)77 for arbitrarily small constant a, and e = o(t7 -4 ). 

5.4 Explicit Capacity Achieving Codes 

In the preceding sections, we showed how to obtain small ensembles of explicit 
capacity achieving codes for various discrete channels, including the important 
special cases BEC(p) and BSC(p). Two drawbacks related to these construc- 
tions are: 

1 . While an overwhelming fraction of the codes in the ensemble are capacity 
achieving, in general it is not clear how to pin down a single, capacity 
achieving code in the ensemble. 

2. For the symmetric additive noise channels, the brute- force decoder is ex- 
tremely inefficient and is of interest only for proving that the constructed 
ensembles are capacity achieving. 
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In a classic work, Justesen [87^ showed that the idea of code concatenation 5 
first introduced by Forney |60| can be used to transform any ensemble of 
capacity achieving codes, for a memoryless channel, into an explicit, efficiently 
decodable code with improved error probability over the same channel. In this 
section we revisit this idea and apply it to our ensembles. For concreteness, 
we focus on the binary case and consider a memoryless channel S that is either 
BEC(p) or BSC(p). 

Throughout this section, we consider an ensemble S of linear codes with 
block length n and rate R, for which it is guaranteed that all but a 7 = o(l) 
fraction of the codes are capacity achieving (for a particular DMSC, in our 
case either BEC(p) or BSC(p)) with some vanishing error probability r\ = o(l) 
(the asymptotics are considered with respect to the block length n). 

Justesen's concatenated codes take an outer code C out of block length s := 
|<S|, alphabet F 2 fc, rate R' as the outer code. The particular choice of the 
outer code in the original construction is Reed-Solomon codes. However, we 
point out that any outer code that allows unique decoding of some constant 
fraction of errors at rates arbitrarily close to one would suffice for the purpose 
of constructing capacity achieving codes. In particular, in this section we 
will use an expander-based construction of asymptotically good codes due to 



Spielman 141 , from which the following theorem can be easily derived 6 : 



Theorem 5.9. For every integer k > and every absolute constant R' < 1, 
there is an explicit family of F2 -linear codes over F 2 fc for every block length 
and rate R' that is error-correcting for an 0(1) fraction of errors. The running 
time of the encoder and the decoder is linear in the bit-length of the codewords. 

5.4.1 Justesen's Concatenation Scheme 

The concatenation scheme of Justesen differs from traditional concatenation 
in that the outer code is concatenated with an ensemble of codes rather than 
a single inner code. 

In this construction, size of the ensemble is taken to be matching with the 
block length of the outer code, and each symbol of the outer code is encoded 
with one of the inner codes in the ensemble. We use the notation C := C ou t<>S 
to denote concatenation of an outer code C ou t with the ensemble S of inner 
codes. Suppose that the alphabet size of the outer code is taken as 2L Rn l, 
where we recall that n and R denote the block length and rate of the inner 
codes in S. 

The encoding of a message with the concatenated code can be obtained 
as follows: First, the message is encoded using C out to obtain an encoding 
(ci, . . . , c s ) G F^, where k = [Rn\ denotes the dimension of the inner codes. 



5 A quick review of code concatenation and its basic properties appears in Appendix \A\ 
6 There are alternative choices of the outer code that lead to a similar result, e.g., 



expander-based codes due to Guruswami and Indyk 76 
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Figure 5.3: Justesen's concatenation scheme. 



Then, for each i £ [s], the ith symbol of the encoding Cj is further encoded by 
the ith code in the ensemble S (under some arbitrary ordering of the codes 
in the ensemble), resulting in a binary sequence c^ of length n. The ns-bit 
long binary sequence (c' 1; . . . , d s ) defines the encoding of the message under 
Cout o<5. The concatenation is scheme is depicted in Figure |5~3} 

Similar to classical concatenated codes, the resulting binary code C has 
block length N := ns and dimension K := kk' , where k' is the dimension of 
the outer code C ou t- However, the neat idea in Justesen's concatenation is 
that it eliminates the need for a brute-force search for finding a good inner 
code, as long as almost all inner codes are guaranteed to be good. 



5.4.2 The Analysis 

In order to analyze the error probability attained by the concatenated code 
Cout o<5, we consider the following naive decoder 7 : 

1. Given a received sequence (yi,...,y s ) G (F^) 15 , apply an appropriate 
decoder for the inner codes (e.g., the brute-force decoder for BSC, or 
Gaussian elimination for BEC) to decode each yi to a codeword c^ of the 
ith code in the ensemble. 



7 Alternatively, one could use methods such as Forney's Generalized Minimum Distance 
(GMD) decoder for Reed-Solomon codes |60l. However, the naive decoder suffices for our 
purposes and works for any asymptotically good choice of the outer code. 
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2. Apply the outer code decoder on (c' x , . . . , c' s ) that is guaranteed to correct 
some constant fraction of errors, to obtain a codeword (ci, . . . , c s ) of the 
outer code C ou t- 

3. Recover the decoded sequence from the corrected encoding (ci, . . . , c s ). 

Since the channel is assumed to be memory less, the noise distributions on 
inner codes are independent. Let Q C [s] denote the set of coordinate positions 
corresponding to "good" inner codes in S that achieve an error probability 
bounded by 77. By assumption, we have Q > (1 — 7)|<S|. 

Suppose that the outer code C ou t corrects some 7 + a fraction of adver- 
sarial errors, for a constant a > r]. Then an error might occur only if more 
than aN of the codes in Q fail to obtain a correct decoding. We expect the 
number of failures within the good inner codes to be rj\Q\. Due to the noise 
independence, it is possible to show that the fraction of failures may deviate 
from the expectation rj only with a negligible probability. In particular, a 
direct application of Chernoff bound implies that the probability that more 
than an a fraction of the good inner codes err is at most 

(5.5) V »'\G\ =2 -^( lo g( 1 /^^) ) 

where a' > is a constant that only depends on a. This also upper bounds 
the error probability of the concatenated code. In particular, we see that if 
the error probability r] of the inner codes is exponentially small in their block 
length n, the concatenated code also achieves an exponentially small error in 
its block length N. 

Now we analyze the encoding and decoding complexity of the concatenated 



code, assuming that Spielman's expander codes (Theorem 5.9) are used for 
the outer code. With this choice, the outer code becomes equipped with a 
linear-time encoder and decoder. Since any linear code can be encoded in 
quadratic time (in its block length) , the concatenated code can be encoded in 
0(n 2 s), which for s>n can be considered "almost linear" in the block length 
N = ns of C. The decoding time of each inner code is cubic in n for the erasure 
channel, since decoding reduces to Gaussian elimination, and thus for this case 
the naive decoder runs in time 0(n 3 s). For the symmetric channel, however, 
the brute-force decoder used for the inner codes takes exponential time in 
the block length, namely, 2 poly(n). Therefore, the running time of the 
decoder for concatenated code becomes bounded by 0(2 Rn spo\y(n)). When 
the inner ensemble is exponentially large; i.e., s = 2 n (which is the case for 
our ensembles if we use the Leftover Hash Lemma) , the decoding complexity 
becomes 0(s 1+ poly(logs)) which is at most quadratic in the block length of 
C. 

Since the rate R' of the outer code can be made arbitrarily close to 1, rate 
of the concatenated code C can be made arbitrarily close to the rate R of the 
inner codes. Thus, if the ensemble of inner codes is capacity-achieving, so 
would be the concatenated code. 
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5.4.3 Density of the Explicit Family 

In the preceding section we saw how to obtain explicit capacity achieving 
codes from capacity achieving code ensembles using concatenation. One of 
the important properties of the resulting family of codes that is influenced by 
the size of the inner code ensemble is the set of block lengths N for which the 
concatenated code is defined. Recall that N = ns, where n and s respectively 
denote the block length of the inner codes and the size of the code ensemble, 
and the parameter s is a function of n. For instance, for all classical examples 
of capacity achieving code ensembles (namely, Wozencraft's ensemble, Goppa 
codes and shortened cyclic codes) we have s{n) = 2 n . In this case, the resulting 
explicit family of codes would be defined for integer lengths of the form N(i) = 

A trivial approach for obtaining capacity achieving codes for all lengths 
is to use a padding trick. Suppose that we wish to transmit a particular 
bit sequence of length K through the channel using the concatenated code 
family of rate p that is taken to be sufficiently close to the channel capacity. 
The sequence might originate from a source that does not produce a constant 
stream of bits (e.g., consider a terminal emulator that produces data only 
when a user input is available). 

Ideally, one requires the length of the encoded sequence to be N = \Kj p\ . 
However, since the family might not be defined for the block length N, we 
might be forced to take a code C in the family with smallest length N' > N 
that is of the form N' = ns(n), for some integer n, and pad the original 
message with redundant symbols. This way we have encoded a sequence of 
length K to one of length N' , implying an effective rate K/N' . The rate loss 
incurred by padding is thus equal to p — K/N' = K(l/N — 1/N'). Thus, if 
N' > N(l + S) for some positive constant 5 > 0, the rate loss becomes lower 
bounded by a constant and thus, even if the original concatenated family is 
capacity achieving, it no longer remains capacity achieving when extended to 
arbitrarily chosen lengths using the padding trick. 

Therefore, if we require the explicit family obtained from concatenation to 
remain capacity achieving for all lengths, the set of block lengths {is(i)}ieM 
for which it is defined must be sufficiently dense. This is the case provided 
that we have 

s(n+l) 

which in turn, requires the capacity achieving code ensemble to have a sub- 
exponential size (by which we mean s{n) = 2°^ n >). 

Using the framework introduced in this chapter, linear extractors and loss- 
less condensers that achieve nearly optimal parameters would result in code 
ensembles of polynomial size in n. The explicit erasure code ensemble obtained 
from Trevisan's extractor (Corollary|5.4[) or Guruswami-Umans-Vadhan's loss- 



less condenser (Corollary 5.5) combined with Justesen's concatenation scheme 
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results in an explicit sequence of capacity achieving codes for the binary era- 
sure channel that is defined for every block length, and allows almost linear- 
time (i.e., N l+ °( 1 ') encoding and decoding. Moreover, the latter sequence of 
codes that is obtained from a lossless condenser is capacity achieving for the 
binary symmetric channel (with a matching bit-flip probability) as well. 

5.5 Duality of Linear Affine Condensers 



In Section |5.2| we saw that linear extractors for bit-fixing sources can be used 
to define generator matrices of a family of erasure-decodable codes. On the 
other hand, we showed that linear lossless condensers for bit-fixing sources 
define parity check matrices of erasure-decodable codes. 

Recall that generator and parity check matrices are dual notions, and in 
our construction we have considered matrices in one-to-one correspondence 
with linear mappings. Indeed, we have used linear mappings defined by ex- 
tractors and lossless condensers to obtain generator and parity check matrices 
of our codes (where the ith row of the matrix defines the coefficient vector of 
the linear form corresponding to the ith output of the mapping). Thus, we get 
a natural duality between linear functions: If two linear functions represent 
generator and parity check matrices of the same code, they can be considered 
dual 8 . Just in the same way that the number of rows of a generator matrix 
and the corresponding parity check matrix add up to their number of columns 
(provided that there is no linear dependence between the rows), the dual of a 
linear function mapping F™ to F™ (where m < n) that has no linear depen- 
dencies among its n — m outputs can be taken to be a linear function mapping 
F™ to F"" m . 

In fact, a duality between linear extractors and lossless condensers for 



affine sources is implicit in the analysis leading to Corollary 5.3 Namely, it 
turns out that if a linear function is an extractor for an affine source, the dual 
function becomes a lossless condenser for the dual distribution, and vice versa. 
This is made precise (and slightly more general) in the following theorem. 

Theorem 5.10. Suppose that the linear mapping defined by a matrix G G 
jpmxTi Q £ ran j c rn <n is a (klogq) — >q (k'logq) condenser for a k-dimensional 
affine source X over F" so that for X ~ X, the distribution of G ■ X has 

entropy at least k'logq. Let H £ Wq m ' be a dual matrix for G (i.e., 
GH = 0) of rank n — m and y be an (n — k) -dimensional affine space 
over F™ supported on a translation of the dual subspace corresponding to the 
support of X. Then for Y ~ y, the distribution of H -Y has entropy at least 
(n — k + k' — m) log q. 

8 Note that, under this notion of duality, the dual of a linear function need not be unique 
even though its linear-algebraic properties (e.g., kernel) would be independent of its choice. 
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Proof. Suppose that X is supported on a set 

{x- A G + a: xeFj}, 

where Aq £Fj xn has rank k and a £ F™ is a fixed row vector. Moreover we 
denote the dual distribution y by the set 

{y.A H + b:y€F n q - k }, 

where b £ F™ is fixed and Ah £ Wq is of rank n — k, and we have the 

orthogonality relationship Ah • Aq = 0. 

The assumption that G is & (k log q) — >q (k' log g)-condenser implies that 
the distribution 

G-(A G .U Fkq +a T ), 

where U-^k stands for a uniformly random row vector in F^, is an affine source 
of dimension at least k', equivalent to saying that the matrix G ■ A G £ F™ xfc 
has rank at least k' (since rank is equal to the dimension of the image), or in 
symbols, 

(5.6) rznk(G ■ A T G ) > k' . 

Observe that since we have assumed rank(G) = m, its right kernel is (n — Tri- 
dimensional, and thus the linear mapping defined by G cannot reduce more 
than n — m dimensions of the affine source X . Thus, the quantity n — k + k' — m 
is non-negative. 

By a similar argument as above, in order to show the claim we need to 
show that 

rank(i7 • Ajj) >n-k + k'-m. 

Suppose not. Then the right kernel of H ■ Ajj G W q n must have 

dimension larger than (n — k) — (n — k + k' — m) = ra — k! . Denote this right 
kernel by 1Z C F" _fe . Since the matrix Ah is assumed to have maximal rank 
n — k, and n — k > m — k', for each nonzero y £ 1Z, the vector y ■ Ah £ F" 
is nonzero and since H • (A H y T ) = (by the definition of right kernel), the 
duality of G and H implies that there is a nonzero x £ F™ where 

x-G = y- A H , 

and the choice of y uniquely specifies x. In other words, there is a subspace 

dim (ft') = dim (ft), 



ft' C F™ such that 



and 

{x-G: x £ft'} = {y A H : y £ ft}. 
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But observe that, by orthogonality of Aq and Ah, every y satisfies y -AhAq = 
0, meaning that for every x 6 1Z' , we must have x ■ GAq = 0. Thus the 



left kernel of GAq has dimension larger than m 

we conclude that the matrix GA G has rank less than k' 



(5.6). 



k' (since VJ does), and 
a contradiction for 

□ 



Since every A:-dimensional affine space over F" has an (n — /c)-dimensional 
dual vector space, the above result combined with Proposition |5.1| directly 
implies the following corollary: 

Corollary 5.11. Suppose that the linear mapping defined by a matrix G € 

lpmxn 
<? 

Let H £ Wq n of rank n — m be so that GH = 0. Then, the linear 



of rank m < n is a (felogg) — > e (k'logq) condenser, for some e < 1/2. 

G W q n n of rank n — m be so that GH = 0. Then, the lir. 

mapping defined by H is an (n—k) log q — >q (n—k+k'—m) log q condenser. □ 



Similarly, linear seeded condensers for affine sources define linear seeded 
dual condensers for affine sources with complementary entropy (this is done 
by taking the dual linear function for every fixing of the seed) . 

Two important special cases of the above results are related to affine ex- 
tractors and lossless condensers. When the linear mapping G is an affine ex- 
tractor for fe-dimensional distributions, the dual mapping H becomes a lossless 
condenser for (n — /c)-dimensional spaces, and vice versa. 



Andante con moto 

espressivo 




Johannes Brahms (1833-1897): Ballade Op. 10 No. 4 in B major. 



Chapter 6 



"I confess that Fermat's Theorem 
as an isolated proposition has 
very little interest for me, 
because I could easily lay down a 
multitude of such propositions, 
which one could neither prove 
nor dispose of." 

- Carl Friedrich Gauss 



Codes on the 
Gilbert-Varshamov Bound 



One of the central problems in coding theory is the construction of codes 
with extremal parameters. Typically, one fixes an alphabet size q, and two 
among the three fundamental parameters of the code (block-length, number 
of codewords, and minimum distance), and asks about extremal values of the 
remaining parameter such that there is a code over the given alphabet with 
the given parameters. For example, fixing the minimum distance d and the 
block- length n, one may ask for the largest number of codewords M such that 
there exists a code over an alphabet with q elements having n, M, d as its 
parameters, or in short, an (n, M, d) q -code. 

Answering this question in its full generality is extremely difficult, espe- 
cially when the parameters are large. For this reason, researchers have con- 
centrated on asymptotic assertions: to any [n,logM,d] q -code C we associate 
a point (5(C), R(C)) G [0,1] 2 , where 5(C) = d/n and R(C) = log q M/n are 
respecitvely the relative distance and rate of the code. A particular point 
(5, R) is called asymptotically achievable (over a g-ary alphabet) if there exists 
a sequence (C\, C2, ■ ■ ■) of codes of increasing block-length such that 5(Ci) — > 5 
and R(Ci) — ¥ R as i — > 00. 

Even with this asymptotic relaxation the problem of determining the shape 
of the set of asymptotically achievable points remains difficult. Let a q (5) be 
defined as the supremum of all R such that (5, R) is asymptotically achievable 
over a g-ary alphabet. It is known that a q is a continuous function of 5 |105| , 
that a q (0) = 1 (trivial), and a q (5) = for 5 > (q — l)/q (by the Plotkin 
bound). However, for no 5 G (0, (q — l)/q) and for no q is the value of a q (5) 
known. 

What is known are lower and upper bounds for a q . The best lower bound 
known is due to Gilbert and Varshamov[68 157 which states that a q (5) > 
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1 — hg(5), where the g-ary entropy function h q is defined as 

h q (5) := -5log q 5 -(I -5) log ? (l - 6) + 5log q (q - 1). 

Up until 1982, years of research had made it plausible to think that this bound 
is tight, i.e., that a q (5) = 1 — h q (5). Goppa's invention of algebraic-geometric 



codes 72 , and the subsequent construction of Tsfasman, Vladut, and Zink 
154 using curves with many points over a finite field and small genus showed 
however that the bound is not tight when the alphabet size is large enough. 
Moreover, Tsfasman et al. also gave a polynomial time construction of such 
codes (which has been greatly simplified since, see, e.g., |67|). 

The fate of the binary alphabet is still open. Many researchers still be- 
lieve that 02(6) = 1 — h,2(5). In fact, for a randomly chosen linear code 
C (one in which the entries of a generator matrix are chosen independently 
and uniformly over the alphabet) and for any positive e we have R(C) > 
1 — hq(5(C)) — e with high probability (with probability at least 1 — 2~ nCt 
where n is the block-length and c e is a constant depending on e). However, 
even though this shows that most randomly chosen codes are arbitrarily close 
to the Gilbert- Varshamov bound, no explicit polynomial time construction of 
such codes is known when the alphabet size is small (e.g., for binary alpha- 
bets). 

In this chapter, we use the technology of pseudorandom generators which 
has played a prominent role in the theoretical computer science research in 
recent years to (conditionally) produce, for any block-length n and any rate 
R < 1, a list of poly(n) many codes of block length n and designed rate R 
(over an arbitrary alphabet) such that a very large fraction of these codes has 
parameters arbitrarily close to the Gilbert- Varshamov bound. Here, poly(n) 
denotes a polynomial in n. 

In a nutshell, our construction is based on the pseudorandom generator 
of Nisan and Wigderson |115| . In particular, we will first identify a Boolean 
function / of which we assume that it satisfies a certain complexity-theoretic 
assumption. More precisely, we assume that the function cannot be computed 
by algorithms that require sub-exponential amount of memory. A natural 
candidate for such a function is given later in the chapter. This function is 
then extended to produce nk bits from O(logn) bits. The extended function 
is called a pseudorandom generator. The main point about this extended 
function is that the nk bits produced cannot be distinguished from random 
bits by a Turing machine with restricted resources. In our case, the output 
cannot be distinguished from a random sequence when a Turing machine is 
used which uses only an amount of space that is polynomially bounded in the 
length of its input. 

The new nk bits are regarded as the entries of a generator matrix of a code. 
Varying the base 0(log n) bits in all possible ways gives us a polynomially long 
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list of codes of which we can show that a majority lies asymptotically on the 
Glibert-Varshamov bound, provided the hardness assumption is satisfied 1 . 

6.1 Basic Notation 

We begin with the definitions of the terms we will use throughout the chapter. 
For simplicity, we restrict ourselves to the particular cases of our interest and 
will avoid presenting the definitions in full generality. See Appendix [A] for a 



quick review of the basic notions in coding theory and 117, 139 for complex- 
ity-theoretic notions. 

Our main tool in this chapter is a hardness-based pseudorandom gener- 
ator. Informally, this is an efficient algorithm that receives a sequence of 
truly random bits at input and outputs a much longer sequence looking ran- 
dom to any distinguisher with bounded computational power. This property 
of the pseudorandom generator can be guaranteed to hold by assuming the 
existence of functions that are hard to compute for certain computational de- 
vices. This is indeed a broad sketch; Depending on what we precisely mean by 
the quantitative measures just mentioned, we come to different definitions of 
pseudorandom generators. Here we will be mainly interested in computational 
hardness against algorithms with bounded space complexity. 

Hereafter, we will use the shorthand DSPACE[s(n)] to denote the class 
of problems solvable with 0(s(n)) bits of working memory and E for the 
class of problems solvable in time 2°( n ' (i.e., E = UceM DTIME[2 cn ], where 
DTIME[t(n)] stands for the class of problems deterministically solvable in time 
0(t(n))). 

Certain arguments that we use in this chapter require non-uniform com- 
putational models. Hence, we will occasionally refer to algorithms that receive 
advice strings to help them carry out their computation. Namely, in addition 
to the input string, the algorithm receives an advice string whose content only 
depends on the length of the input and not the input itself. It is assumed that, 
for every n, there is an advice string that makes the algorithm work correctly 
on all inputs of length n. We will use the notation DSPACE[/(n)]/g(n) for 
the class of problems solvable by algorithms that receive g{n) bits of advice 
and use 0(f(n)) bits of working memory. 

Definition 6.1. Let S: IN — > IN be a (constructible) function. A Boolean 
function /: {0, 1}* — >• {0, 1} is said to have hardness S if for every algorithm 



We remark that the method used in this chapter can be regarded as a "relativized" 
variation of the original Nisan-Wigderson generator and, apart from construction of error- 
correcting codes, can be applied to a vast range of probabilistic constructions of combinato- 
rial objects (e.g., Ramsey graphs, combinatorial designs, etc). Even though this derandom- 
ization technique seems to be "folklore" among the theoretical computer science community, 
it is included in the thesis mainly since there appears to be no elaborate and specifically 
focused writeup of it in the literature. 
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A in DSPACE[S(n)]/0(S(n)) and infinitely many n (and no matter how the 
advice string is chosen) it holds that 

| Pr[A(x) = /(x)l - 1/2| < 1/S(n), 

X 

where x is uniformly sampled from {0, l} n . 

Obviously, any Boolean function can be trivially computed correctly on at 
least half of the inputs by an algorithm that always outputs a constant value 
(either or 1). Intuitively, for a hard function no efficient algorithm can do 
much better. For the purpose of this chapter, the central hardness assumption 
that we use is the following: 

Assumption 1. There is a Boolean function in E with hardness at least 2 en , 
for some constant e > 0. 

The term pseudorandom generator emphasizes the fact that it is infor- 
mation-theoretically impossible to transform a sequence of truly random bits 
into a longer sequence of truly random bits, hence the best a transformation 
with a nontrivial stretch can do is to generate bits that look random to a 
particular family of observers. To make this more precise, we need to define 
computational indistinguishability first. 

Definition 6.2. Let p = {p n } and q = {q n } be families of probability dis- 
tributions, where p n and q n are distributed over {0, l} n . Then p and q are 
(S,£, e)- indistinguishable (for some S, £: IN — >■ M and e: IN — > (0, 1)) if for every 
algorithm A in DSPACE(S(n))/0(£(n)) and infinitely many n (and no matter 
how the advice string is chosen) we have that 

| Pt[A(x) = 1] - Pr[A(y) = 1]| < e(n), 

x y 

where x and y are sampled from p n and q n , respectively. 

This is in a way similar to computational hardness. Here the hard task is 
telling the difference between the sequences generated by different sources. In 
other words, two probability distributions are indistinguishable if any resource- 
bounded observer is fooled when given inputs sampled from one distribution 
rather than the other. Note that this may even hold if the two distributions 
are not statistically close to each other. 

Now we are ready to define pseudorandom generators we will later need. 

Definition 6.3. A deterministic algorithm that computes a function 

G: {0,l} clog "^{0,l} n 

(for some constant c > 0) is called a (high-end) pseudorandom generator if 
the following conditions hold: 



154 CHAPTER 6. CODES ON THE GILBERT-VARSHAMOV BOUND 



1. It runs in polynomial time with respect to n. 

2. Let the probability distribution G n be defined uniformly over the range 
of G restricted to outputs of length n. Then the family of distributions 
{Gn} is (n,n, l/n)-indistinguishable from the uniform distribution. 

An input to the pseudorandom generator is referred to as a random seed. 
Here the length of the output as a function of the seed length s, known as 
the stretch of the pseudorandom generator, is required to be the exponential 
function 2 S I C . 



6.2 The Pseudorandom Generator 

A pseudorandom generator, as we just defined, extends a truly random se- 
quence of bits into an exponentially long sequence that looks random to any 
efficient distinguisher. From the definition it is not at all clear whether such 
an object could exist. In fact the existence of pseudorandom generators (even 
much weaker than our definition) is not yet known. However, there are various 
constructions of pseudorandom generators based on unproven (but seemingly 
plausible) assumptions. The presumed assumption is typically chosen in line 
with the same guideline, namely, a computational task being intractable. For 



instance, the early constructions of 1 135 and 15 are based on the intractabil- 



ity of certain number-theoretic problems, namely, integer factorization and 
the discrete logarithm function. Yao |166| extends these ideas to obtain pseu- 
dorandomness from one-way permutations. This is further generalized by [80] 
who show that the existence of any one-way function is sufficient. However, 
these ideas are mainly motivated by cryptographic applications and often re- 
quire strong assumptions. 

The prototypical pseudorandom generator for the applications in deran- 



domization, which is of our interest, is due to Nisan and Wigderson 115 . They 
provide a broad range of pseudorandom generators with different strengths 
based on a variety of hardness assumptions. In rough terms, their generator 
works by taking a hard function for a certain complexity class, evaluating 
it in carefully chosen points (related to the choice of the random seed), and 
outputting the resulting sequence. Then one can argue that an efficient distin- 
guisher can be used to efficiently compute the hard function, contradicting the 
assumption. Note that for certain complexity classes, hard functions are prov- 
ably known. However, they typically give generators too weak to be applied 
in typical derandomizations. Here we simply apply the Nisan- Wigderson con- 
struction to obtain a pseudorandom generator which is robust against space- 
efficient computations. This is shown in the following theorem: 

Theorem 6.4. Assumption [I] implies the existence of a pseudorandom gen- 
erator as in Definition \6.3\ That is to say, suppose that there is a constant 
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e > and a Boolean function computable in time 2°^ n > that has hardness 
2 en . Then there exists a function G: {O,l}°( lo s n ) -»• {0, l} n computable in 
time polynomial in n whose output (when given uniformly random bits at 
input) is indistinguishable from the uniform distribution for all algorithms in 
DSPACE[n]/0(n). 

Proof. |115| Let / be a function satisfying Assumption [I] for some fixed e > 0, 
and recall that we intend to generate n pseudorandom bits from a truly random 
seed of length £ which is only logarithmically long in n. 

The idea of the construction is as follows: We evaluate the hard function 
/ in n carefully chosen points, each of the same length m, where m is to 
be determined shortly. Each of these m-bit long inputs is obtained from a 
particular subset of the £ bits provided by the random seed. This can be con- 
veniently represented in a matrix form: Let T> be an n x £ binary matrix, each 
row of which having the same weight m. Now the pseudorandom generator G 
is described as follows: The ith bit generated by G is the evaluation of / on 
the projection of the £-bit long input sequence to those coordinates indicated 
by the ith row of T>. Note that because / is in E, the output sequence can be 
computed in time polynomial in n, as long as m is logarithmically small. 

As we will shortly see, it turns out that we need T> to satisfy a certain 
small-overlap property. Namely, we require the bitwise product of each pair 
of the rows of T> to have weight at most logra. A straightforward counting 
argument shows that, for a logarithmically large value of m, the parameter 
£ can be kept logarithmically small as well. In particular, for the particular 
choice of m := ~ logn, the matrix T> exists with £ = O(logn). Moreover, rows 
of the matrix can be constructed (in time polynomial in n) using a simple 
greedy algorithm. 

To show that our construction indeed gives us a pseudorandom generator, 
suppose that there is an algorithm A working in DSPACE[n]/0(n) which is 
able to distinguish the output of G from a truly random sequence with a bias 
of at least 1/n. That is, for all large enough n it holds that 

5 := | Pr[A a ( n \y) = 1] - Pr[A a ^ n \G(x)) = 1]| > 1/n, 
y x 

where x and y are distributed uniformly in {0, 1} and {0, l} 71 , respectively, 
and a(n) in the superscript denotes an advice string of linear length (that 
only depends on n). The goal is to transform A into a space-efficient (and 
non-uniform) algorithm that approximates /, obtaining a contradiction. 

Without loss of generality, let the quantity inside the absolute value be non- 
negative (the argument is similar for the negative case). Let the distribution 
Di (for < i < n) over {0, l} n be defined by concatenation of the length-i 
prefix of G(x), when x is chosen uniformly at random from {0,1}, with a 
Boolean string of length n — i obtained uniformly at random. Define pi as 
Pr z [A a ( n >(z) = 1], where z is sampled from Di, and let 5{ := Pi~\ — Pi- Note 
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that -Do is the uniform distribution and D n is uniformly distributed over the 
range of G. Hence, we have Yl7=l $i = Po ~ Pn = 8 > 1/n, meaning that for 
some i, 5i > 1/n 2 . Fix this i in the sequel. 

Without loss of generality, assume that the ith bit of G{x) depends on the 
first m bits of the random seed. Now consider the following randomized proce- 
dure B: Given i — 1 input bits u\ , . . . , Ui-i, choose a binary sequence n,...,r n 
uniformly at random and compute A a ^ n '{u\, . . . , Uj_i, fj, . . . , r n ). If the out- 
put was 1 return n, otherwise, return the negation of r*j. It is straightforward 
to show that 

(6.1) Pr[5(G(x)r 1 ) = G(x)i] >l + S t . 

x,r I 

Here, G(x) l {~ and G(x)i are shorthands for the (i — l)-bit long prefix of G(x) 
and the ith bit of G(x), respectively, and the probability is taken over the 
choice of x and the internal coins of B. 

So far we have constructed a linear-time probabilistic procedure for guess- 
ing the ith pseudorandom bit from the first i — 1 bits. By averaging, we note 
that there is a particular choice of n, ■ . ■ , r n , independent of x, that preserves 



the bias given in (6.1). Furthermore, note that the function G(x)i we are try- 
ing to guess, which is in fact f(xi, ■ . ■ , x m ), does not depend on x m +i, ■■■,%£■ 
Therefore, again by averaging we see that these bits can also be fixed. There- 
fore, for a given sequence x\, . . . ,x m , one can compute G(x) % {~ , feed it to 
B (having known the choices we have fixed), and guess G{x)i with the same 



bias as in (6.1). The problem is of course that G(x)\ does not seem to be 



easily computable. However, what we know is that each bit of this sequence 
depends only on logn bits of xi, . . . ,x m , followed by the construction of T>. 
Hence, having fixed x m+ i, . . . , xi, we can trivially describe each bit of G(x)]~ 
by a Boolean formula (or a Boolean circuit) of exponential size (that is, of size 
0(2 logn ) = 0(n)). These i — 1 = 0(n) Boolean formulae can be encoded as 
an additional advice string of length 0(n 2 ) (note that their descriptions only 
depend on n), implying that G{x) l ^~ can be computed in linear space using 
0(n 2 ) bits of advice. 

All the choices we have fixed so far (namely, i, rj, . . . , r n , x m+ i, . . . , xi) only 
depend on n and can be absorbed into the advice string as well 2 . Combined 
with the bit-guessing algorithm we just described, this gives us a linear-space 
algorithm that needs an advice of quadratic length and correctly computes 
f(xi, . . . ,x m ) on at least a \ + Si fraction of inputs, which is off from 1/2 
by a bias of at least 1/n 2 . But this is not possible by the hardness of /, 
which is assumed to be at least 2 em = n 2 . Thus, G must be a pseudorandom 
generator. □ 



Alternatively, one can avoid using this additional advice by enumerating over all possible 
choices and taking a majority vote. However, this does not decrease the total advice length 
by much. 
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The above proof uses a function that is completely unpredictable for every 



efficient algorithm. Impagliazzo and Wigderson 85 improve the construction 
to show that this requirement can be relaxed to one that only requires a worst 
case hardness, meaning that the function computed by any efficient (non- 
uniform) algorithm needs to differ from the hard function on at least one input. 
In our application, this translates into the following hardness assumption: 

Assumption 2. There is a constant e > and a function f in E such that 
every algorithm in DSPACE[S(n)]/0(S(n)) that correctly computes f requires 

S{n) = Q{2 m ). 

The idea of their result (which was later reproved in |146| using a coding- 
theoretic argument) is to amplify the given hardness, that is, to transform 
a worst-case hard function in E to another function in E which is hard on 
average. In our setting, this gives us the following (since the proof essentially 
carries over without change, we only sketch the idea): 

Theorem 6.5. Assumption^ implies Assumption^ and hence, the existence 
of pseudorandom generators. 

Proof Idea. [146] Let a function / be hard in worst case. Consider the truth 
table of / as a string x of length N := 2 n . The main ingredient of the proof is a 
linear code C with dimension N and length polynomial in N, which is obtained 
by concatenation of a Reed-Muller code with the Hadamard code. The code 
is list-decodable up to a fraction ~ — e of errors, for arbitrary e > 0. Moreover, 
decoding can be done in sub-linear time, that is, by querying the received word 
only at a small number of (randomly chosen) positions. Then the truth table 
of the transformed function g can be simply defined as the encoding of x with 
C. Hence g can be evaluated at any point in time polynomial in N, which 
shows that g E E. Further, suppose that an algorithm A can space-efficiently 
compute g correctly in a fraction of points non-negligibly bounded away from 
1/2 (possibly using an advice string). Then the function computed by A can 
be seen as a corrupted version of the codeword g and can be efficiently recovered 
using the list-decoding algorithm. From this, one can obtain a space-efficient 
algorithm for computing /, contradicting the hardness of /. Hence g has to 
be hard on average. □ 

While the above result seems to require hardness against non-uniform algo- 
rithms (as phrased in Assumption^]), we will see that the hardness assumption 
can be further relaxed to the following, which only requires hardness against 
uniform algorithms: 

Assumption 3. The complexity class E is not contained in DSPACE[2°( n )]. 

Remark. A result by Hopcroft et al. [83] shows a deterministic simulation of 
time by space. Namely, they prove that 

DTIME[t(n)] C DSPACE[i(ra)/ log t(n)]. 
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However, this result is not strong enough to influence the hardness assumption 
above. To violate the assumption, a much more space-efficient simulation in 
the form 

DTIME[t(n)] C DSPACE[t(n) o(1) ] 

is required. 

Before we show the equivalence of the two assumptions (namely, Assump- 
tion [2] and Assumption [3|, we address the natural question of how to construct 
an explicit function to satisfy the required hardness assumption (after all, 
evaluation of such a function is needed as part of the pseudorandom generator 
construction). One possible candidate (which is a canonical hard function for 
E) is proposed in the following lemma: 

Lemma 6.6. Let £e be the set (encoded in binary) 

{(M, x,t,i) I M is a Turing machine, where given 

input x at time t the ith bit of its conhguration is 1}, 

and let the Boolean function /e be its characteristic function. Then if As- 
sumption^ is true, it is satished by /e- 

Proof. First we show that £e is complete for E under Turing reductions 
bounded in linear space. The language being in E directly follows from the 
efficient constructions of universal Turing machines. Namely, given a properly- 
encoded input (M, x, t, i), one can simply simulate the Turing machine M on 
x for t steps and decide according to the configuration obtained at time t. 
This indeed takes exponential time. Now let L be any language in E which is 
computable by a Turing machine M in time 2 cn , for some constant c > 0. For 
a given x of length n, using an oracle for solving ft, one can query the oracle 
with inputs of the form (M, x, 2 cn , i) (where the precise choice of i depends on 
the particular encoding of the configurations) to find out whether M is in an 
accepting state, and hence decide L. This can obviously be done in space lin- 
ear in n, which concludes the completeness of £e- Now if Assumption [3] is true 
and is not satisfied by /e, this completeness result allows one to compute all 
problems in E in sub-exponential time, which contradicts the assumption. □ 

The following lemma shows that this seemingly weaker assumption is in 
fact sufficient for our pseudorandom generator: 

Lemma 6.7. Assumptions^ and\^ are equivalent. 

Proof. This argument is based on |108[ Section 5.3]. First we observe that, 
given a black box C that receives n input bits and outputs a single bit, it 
can be verified in linear space whether C computes the restriction of /e to 
inputs of length n. To see this, consider an input of the form (M, x, t, i), as in 
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the statement of Lemma 6.6 The correctness of C can be explicitly checked 
when the time parameter t is zero (that is, C has to agree with the initial 
configuration of M). Moreover, for every time step t > 0, the answer given 
by C has to be consistent with that of the previous time step (namely, the 
transition made at the location of the head should be legal and every other 
position of the tape should remain unchanged). Thus, on can verify C simply 
by enumerating all possible inputs and verifying whether the answer given 
by C remains consistent across subsequent time steps. This can obviously be 
done in linear space. 



Now suppose that Assumption [3] is true and hence, by Lemma 6.6, is 



satisfied by /e- That is, there is a constant e > such that every algorithm 
for computing /e requires space 0(2 en ). Moreover, assume that there is an 
algorithm A working in DSPACE[S (n)]/0(S(n)) that computes /e- Using 
the verification procedure described above, one can (uniformly) simulate A 
in space 0(S(n)) by enumerating all choices of the advice string and finding 
the one that makes the algorithm work correctly. Altogether this requires 
space 0(S(n)). Combined with the hardness assumption, we conclude that 
5(n) = 0(2 en ). The converse direction is obvious. □ 

Putting everything together, we obtain a very strong pseudorandom gen- 
erator as follows: 

Corollary 6.8. Assumption^ implies the existence of pseudorandom genera- 
tors whose output of length n is (n, n, 1 / 'n) -indistinguishable from the uniform 
distribution. □ 

6.3 Derandomized Code Construction 



As mentioned before, the bound given by Gilbert and Varshamov 68 , 157) 
states that, for a g-ary alphabet, large enough n, and for any value of < 6 < 
(q — l)/q, there are codes with length n, relative distance at least S and rate 
r > 1 — hq(6), where h q is the Q-ary entropy function. Moreover, a random 
linear code (having each entry of its generator matrix chosen uniformly at 
random) achieves this bound. In fact, for all < r < 1, in the family of linear 
codes with length n and (designed) dimension nr, all but only a sub-constant 
fraction of the codes achieve the bound when n grows to infinity. However, 
the number of codes in the family is exponentially large (q nr ) and we do not 
have an a priori indication on which codes in the family are good. Putting it 
differently, a randomized algorithm that merely outputs a random generator 
matrix succeeds in producing a code on the GV bound with probability 1— o(l). 
However, the number of random bits needed by the algorithm is nk log q. For 
simplicity, in the sequel we only focus on binary codes, for which no explicit 
construction approaching the GV bound is known. 
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The randomized procedure above can be considerably derandomized by 
considering a more restricted family of codes. Namely, fix a length n and 
a basis for the finite field F m , where m := 2 n > 2 . Then over such a basis 
there is a natural isomorphism between the elements of F m and the elements 

Till 

of the vector space F 2 . Now for each a £ F m , define the code C a as the 
set {(x,ax) | x S F m }, where the elements are encoded in binary 3 . This 
binary code has rate 1/2. Further, it is well known that C a achieves the 
GV bound for all but 1 — o(l) fraction of the choices of a. Hence in this 
family a randomized construction can obtain very good codes using only n/2 
random bits. Here we see how the pseudorandom generator constructed in 
the last section can dramatically reduce the amount of randomness needed in 
all code constructions. Our observation is based on the composition of the 
following facts: 

1. Random codes achieve the Gilbert- Varshamov bound: It is well known 
that a simple randomized algorithm that chooses the entries of a gen- 
erator matrix uniformly at random obtains a linear code satisfying the 
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2. Finding the minimum, distance of a (linear) code can be performed in 
linear space: One can simply enumerate all the codewords to find the 
minimum weight codeword, and hence, the distance of the code. This 
only requires linear amount of memory with respect to the block length. 

3. Provided a hardness condition, namely that sub- exponential space algo- 
rithms cannot compute all the problems in E, every linear space algorithm 
can be fooled by an explicit pseudorandom generator: This is what we 
obtained in Corollary |6. 



Now we formally propose a general framework that can be employed to 
derandomize a wide range of combinatorial constructions. 

Lemma 6.9. Let S be a family of combinatorial objects of (binary-encoded) 
length n, in which an e fraction of the objects satisfy a property P. Moreover, 
suppose that the family is efficiently samplable, that is, there is a polynomial- 
time algorithm (in n) that, for a given i, generates the ith member of the 
family. Further assume that the property P is verifiable in polynomial space. 
Then for every constant k > 0, under Assumption^ there is a constant £ and 
an efficiently samplable subset of S of size at most n e in which at least an 
e — n fraction of the objects satisfy P. 

Proof. Let A be the composition of the sampling algorithm with the verifier for 
P. By assumption, A needs space n s , for some constant s. Furthermore, when 
the input of A is chosen randomly, it outputs 1 with probability at least e. 



These codes are attributed to J. M. Wozencraft (see 106 ). 
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Suppose that the pseudorandom generator of Corollary 6.8 transforms clogn 
truly random bits into n pseudorandom bits, for some constant c > 0. Now 
it is just enough to apply the pseudorandom generator on c • max{s, k} ■ log n 
random bits and feed n of the resulting pseudorandom bits to A. By this con- 
struction, when the input of the pseudorandom generator is chosen uniformly 
at random, A must still output 1 with probability e — n as otherwise the 
pseudorandomness assumption would be violated. Now the combination of 
the pseudorandom generator and A gives the efficiently samplable family of 
the objects we want, for t := c • max{s, k}, as the random seed runs over all 
the possibilities. □ 

As the distance of a code is obviously computable in linear space by enu- 
meration of all the codewords, the above lemma immediately implies the exis- 
tence of a (constructible) polynomially large family of codes in which at least 
1 — n~ k of the codes achieve the GV bound, for arbitrary k. 

Remark. As shown in the original work of Nisan and Wigderson [115] (fol- 
lowed by the hardness amplification of Impagliazzo and Wigderson |85|) all 
randomized polynomial-time algorithms (namely, the complexity class BPP) 
can be fully derandomized under the assumption that E cannot be computed 
by Boolean circuits of sub-exponential size. This assumption is also sufficient 
to derandomize probabilistic constructions that allow a (possibly non-uniform) 
polynomial-time verification procedure for deciding whether a particular ob- 
ject has the desirable properties. For the case of good error-correcting codes, 
this could work if we knew of a procedure for computing the minimum dis- 
tance of a linear code using circuits of size polynomial in the length of the 
code. However, it turns out that (the decision version of) this problem is 



N P-complete [156 , and even the approximation version remains N P-complete 



[52) . This makes such a possibility unlikely. 

However, a key observation, due to Klivans and van Melkebeek 92 , shows 
that the Nisan- Wigderson construction (as well as the Impagliazzo- Wigderson 
amplification) can be relativized. Namely, starting from a hardness assump- 
tion for a certain family of oracle circuits (i.e., Boolean circuits that can use 
special gates to compute certain Boolean functions as black box) one can ob- 
tain pseudorandom generators secure against oracle circuits of the same fam- 
ily. In particular, this implies that any probabilistic construction that allows 
polynomial time verification using NP oracles (including the construction of 
good error-correcting codes) can be derandomized by assuming that E cannot 
be computed by sub-exponential sized Boolean circuits that use NP oracle 



gates. However, the result given by Lemma 6.9 can be used to derandomize a 
more general family of probabilistic constructions, though it needs a slightly 
stronger hardness assumption which is still plausible. 
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Chapter 7 



"To achieve great things, two 
things are needed; a plan, and 
not quite enough time." 

— Leonard Bernstein 



Concluding Remarks 



In this thesis, we investigated the role of objects studied at the core of theoret- 
ical computer science-namely, randomness extractors, condensers and pseudo- 
random generators-in efficient construction of combinatorial objects suitable 
for more practical applications. The applications being considered all share a 
coding-theoretic flavor and include: 

1. Wiretap coding schemes, where the goal is to provide information-theo- 
retic secrecy in a communication channel that is partially observable by 
an adversary (Chapter [3]); 

2. Combinatorial group testing schemes, that allow for efficient identifi- 
cation of sparse binary vectors using potentially unreliable disjunctive 
measurements (Chapter [4]); 

3. Capacity achieving codes, which provide optimally efficient and reli- 
able transmission of information over unreliable discrete communication 
channels (Chapter [5]); 

4. Codes on the Gilbert- Varshamov bound, which are error-correcting codes 
whose rate-distance trade-off matches what achieved by probabilistic 
constructions (Chapter [6]). 

We conclude the thesis by a brief and informal discussion of the obtained 
results, open problems and possible directions for future research. 

Wiretap Protocols 

In Chapter [3] we constructed rate-optimal wiretap schemes from optimal affine 
extractors. The combinatorial structure of affine extractors guarantees almost 
perfect privacy even in presence of linear manipulation of information. This 
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observation was the key for our constructions of information-theoretically opti- 
mal schemes in presence of noisy channels, active intruders, and linear network 
coding. 

Despite being sufficiently general for a wide range of practical applica- 
tions, it makes sense to consider different types of intermediate processing. 



We showed in Section |3.7.3| that, at the cost of giving up zero leakage, it 
is possible to use seeded extractors to provide secrecy in presence of arbi- 
trary forms of transformations. However, in order to attain zero leakage, it 
becomes inevitable to construct seedless, invertible extractors for a class of 
random sources that capture the nature of post-processing being allowed. 

For example, suppose that the encoded information is transmitted through 
a packet network towards a destination, where information is arbitrarily ma- 
nipulated by intermediate routers, but is routed from the source to the desti- 
nation through k > 2 separated paths. In this case, the intruder may learn a 
limited amount of information from each of the k components of the network. 
Similar arguments as what presented in Chapter [3] can now be used to show 
that the object needed for ensuring secrecy in this "route-disjoint" setting is 



invertible, A;-source extractors. Shaltiel 132 demonstrates that his method 
for boosting the output size of extractors using output-length optimal seeded 
extractors (that is the basis of our technique for making seedless extractors 
invertible) can be extended to the case of two-source extractors as well. 

On the other hand, if the route-disjointness condition that is assumed in 
the above example is not available, zero leakage can no longer be guaranteed 



without imposing further restrictions (since, as discussed in Section 3.7.3 
this would require seedless extractors for general sources, which do not ex- 
ist). However, assume that the intermediate manipulations are carried out 
by computationally bounded devices (a reasonable assumption to model the 
real world). A natural candidate for modeling resource-bounded computation 
is the notion of small-sized Boolean circuits. The secrecy problem for this 
class of transformations leads to invertible extractors for the following class of 
sources: 

For an arbitrary Boolean function C: {0, l} n — > {0, 1} that is com- 
putable by Boolean circuits of bounded size, the source is uniformly 
distributed on the set of inputs x £ {0, l} n such that C{x) = 
(assuming that this set has a sufficiently large size). 



In a recent work of Shaltiel |133| , this type of extractors have been studied 
under the notion of "extractors for recognizable sources" (a notion that can 
be specialized to different sub-classes depending on the bounded model of 
computation being considered. 

On the other hand, Trevisan and Vadhan |153 introduce the related notion 



of extractors for samplable sources, where a samplable source is defined as the 
image of a small-sized circuit (having multiple outputs) when provided with a 
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uniformly random input. They proceed to show explicit constructions of such 
extractors assuming suitable computational hardness assumptions (which turn 
out to be to some extent necessary for such extractors to be constructible). It 
is straightforward to see that their techniques can be readily extended to con- 
struction of explicit extractors for sources recognizable by small-sized circuits 
(using even weaker hardness assumptions). However, the technique works 
when the source entropy is assured to be substantially large, and even so, 
is unable to produce a nearly optimal output length. To this date, explicit 
construction of better extractors, under mild computational assumptions, for 
sources that are samplable (or recognizable) by small-sized circuits remains 
an important open problem. 

Observe that the technique of using extractors for construction of wire- 
tap protocols as presented in Chapter [3] achieves optimal rates only if the 
wiretap channel (i.e., the channel that delivers intruder's information) is of 
erasure nature. That is, we have so far assumed that, after some possible 
post-processing of the encoded information, the intruder observes an arbitrar- 
ily chosen, but bounded, subset of the bits being transmitted and remains 
unaware of the rest. There are different natural choices of the wiretap channel 
that can be considered as well. For example, suppose that the intruder ob- 
serves a noisy version of the entire sequence being transmitted (e.g., when a 
fraction of the encoded bits get randomly nipped before being delivered to the 
intruder). An interesting question is to see whether invertible extractors (or 
a suitable related notion) can be used to construct information-theoretically 
optimal schemes for such variations as well. 

Group Testing 

Non-adaptive group testing schemes are fundamental combinatorial objects of 
both theoretical and practical interest. As we showed in Chapter [4j strong 
condensers can be used as building blocks in construction of noise-resilient 
group testing and threshold group testing schemes. 

The factors that greatly influence the quality of our constructions are 
the seed length and output length of the condenser being used. As we saw, 
in order to obtain an asymptotically optimal number of measurements, we 
need explicit constructions of extractors and lossless condensers that achieve 
a logarithmic seed length, and output length that is different from the source 
entropy by small additive terms. While, as we saw, there are very good existing 
constructions of both extractors and lossless condensers that can be used, they 
are still sub-optimal in the above sense. Thus, any improvement on the state 
of the art in explicit construction of extractors and lossless condensers will 
immediately improve the qualities of our explicit constructions. 

Moreover, our constructions of noise-resilient schemes with sublinear de- 
coding time demonstrates a novel application for list-decodable extractors and 
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condensers. This motivates further investigation of these objects for improve- 
ment of their qualities. 



In Section 4.3, we introduced the combinatorial notion of (d, e; u)-regular 
matrices, that is used as an intermediate tool towards obtaining threshold 
testing designs. Even though our construction, assuming an optimal lossless 
condenser, matches the probabilistic upper bound for regular matrices, the 
number of measurements in the resulting threshold testing scheme will be 
larger than the probabilistic upper bound by a factor of fi(dlogn). Thus, 
an outstanding question is coming up with a direct construction of disjunct 
matrices that match the probabilistic upper bound. 

Despite this, the notion of regular matrices may be of independent interest, 
and an interesting question is to obtain (nontrivial) concrete lower bounds on 
the number of rows of such matrices in terms of the parameters d, e, u. 

Moreover, in our constructions we have assumed the threshold u to be a 
fixed constant, allowing the constants hidden in asymptotic notions to have 
a poor dependence on u. An outstanding question is whether the number 
of measurements can be reasonably controlled when u becomes large; e.g., 
u = n(d). 

Another interesting problem is decoding in the threshold model. While 
our constructions can combinatorially guarantee identification of sparse vec- 
tors, for applications it is important to have an efficient reconstruction algo- 
rithm as well. Contrary to the case of strongly disjunct matrices that allow 
a straightforward decoding procedure (cf. [27|), it is not clear whether in 
general our notion of disjunct matrices allow efficient decoding, and thus it 
becomes important to look for constructions that are equipped with efficient 
reconstruction algorithms. 

Finally, for clarity of the exposition, in this presentation we have only 
focused on asymptotic trade-offs, and it would be nice to obtain good, non- 
asymptotic, estimates on the obtained bounds that are useful for applications. 

Capacity Achieving Codes 

The general construction of capacity-achieving codes presented in Chapter [5] 
can be used to obtain a polynomial-sized ensemble of codes of any given block 
length n, provided that nearly optimal linear extractors or lossless condensers 
are available. In particular, this would require a logarithmic seed length and 
an output length which is different from the input entropy by an arbitrarily 
small constant fraction of the entropy. Both extractors and lossless condensers 
constructed by Guruswami, Umans, and Vadhan [78] achieve this goal, and 
as we saw in Chapter [2], their lossless condenser can be easily made linear. 
However, to the best of our knowledge, to this date no explicit construction of 
a linear extractor with logarithmic seed length that extracts even a constant 
fraction of the source entropy is known. 
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Another interesting problem concerns the duality principle presented in 
Section [5. 5| As we showed, linear affine extractors and lossless condensers are 
dual objects. It would be interesting to see whether a more general duality 
principle exist between extractors and lossless condensers. It is not hard to 
use basic Fourier analysis to slightly generalize our result to linear extractors 
and lossless condensers for more general (not necessarily affine) sources. How- 
ever, since condensers for general sources are allowed to have a positive, but 
negligible error (which is not the case for linear affine condensers), controlling 
the error to a reasonable level becomes a tricky task, and forms an interesting 
problem for future research. 

The Gilbert- Varshamov Bound 

As we saw in Chapter [6j a suitable computational assumption implies a deter- 
ministic polynomial-time algorithm for explicit construction of polynomially 
many linear codes of a given length n, almost all of which attaining the Gilbert- 
Varshamov bound. That is, a randomly chosen code from such a short list 
essentially behaves like a fully random code and in particular, is expected to 
attain the same rate-distance tradeoff. 

An important question that remains unanswered is whether a single code of 
length n attaining the Gilbert- Varshamov bound can be efficiently constructed 
from a list of poly(ra) codes in which an overwhelming fraction attain the 
bound. In effect, we are looking for an efficient code product to combine a 
polynomially long list of codes (that may contain a few unsatisfactory codes) 
into a single code that possesses the qualities of the overwhelming majority of 
the codes in the ensemble. Since the computational problem of determining 
(or even approximating) the minimum distance of a linear code is known to 
be intractable, such a product cannot be constructed by simply examining the 
individual codes. It is also interesting to consider impossibility results, that is, 
models under which such a code product may become as difficult to construct 
as finding a good code "from scratch" . 

Finally, a challenging problem which still remains open is explicit con- 
struction of codes (or even small ensembles of codes) that attain the Gilbert- 
Varshamov bound without relying on unproven assumptions. For sufficiently 
large alphabets (i.e., of size 49 or higher), geometric Goppa codes are known 
to even surpass the GV bound |154| . However, for smaller alphabets, or rates 
close to zero over constant-sized alphabets, no explicit construction attaining 
the GV bound is known. It also remains unclear whether in such cases the GV 
bound is optimal; that is, whether there are families of codes, not necessarily 
explicit, that beat the bound. 
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Alexander Scriabin (1872-1915): Piano Sonata No. 2 in G sharp minor 
(Op. 19, "Sonata-Fantasy" ). 



Appendix A 



"A classic is a book that has 
never Gnished saying what it has 
to say. " 

— Italo Calvino 



A Primer on Coding Theory 



In this appendix, we briefly overview the essential notions of coding theory 
that we have used in the thesis. For an extensive treatment of the theory of 
error-correcting codes (an in particular, the facts collected in this appendix), 



we refer the reader to the books by Mac Williams and Sloane 103] , van Lint 
98 , and Roth [127] on the topic. 



A.l Basics 

Let E be a finite alphabet of size q > 1. A code C of length n over E is 
a non-empty subset of E n . Each element of C is called a codeword and \C\ 
defines the size of the code. The rate of the C is defined as log g |C|/n. An 
important choice for the alphabet is E = {0, 1}, which results in a binary code. 
Typically, we assume that q is a prime power and take E to be the finite field 

The Hamming distance between vectors w := (w\, . . . , w n ) £ E n and w' := 
(w[, . . . ,w' n ) £ E n is defined as the number of positions at which w and w' 
differ. Namely, 

dist(w, w') := \{i € [n] : Wi / ^}|. 

The Hamming weight of a vector w £ F™ (denoted by wgt(w)) is the number 
of its nonzero coordinates; i.e., 

wgt(w) :=|{»€ [»]: Wi^0}\. 

Therefore, when w,w' G F™, we have 

dist(u;,u/) = wgt(u> — w ). 

The minimum distance of a code C C E n is the quantity 

dist(C) := min dist(w,u/), 

w,w'ac 
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and the relative distance of the code is defined as dist(C)/n. A family of codes 
of growing block lengths n is called asymptotically good if, for large enough 
n, it achieves a positive constant rate (i.e., independent of n) and a positive 
constant relative distance. 

A code C £ F™ is called linear if it is a vector subspace of F™ . In this case, 
the dimension of the code is defined as its dimension as a subspace, and the 
rate would be given by the dimension divided by n. A code C with minimum 
distance d is denoted by the shorthand (n, log g |C|,d) g , and when C is linear 
with dimension k, by [n, k, d] q . The subscript q is omitted for binary codes. 
Any linear code must include the all-zeros word 0™. Moreover, due to the 
linear structure of such codes, the minimum distance of a linear code is equal 
to the minimum Hamming weight of its nonzero codewords. 

A generator matrix G for a linear [n, k, d] q -code C is a k x n matrix of rank 
k over W q such that 

C = {xG: xeW k q }. 

Moreover, a parity check matrix H for C is an r x n matrix over W q of rank 
n — k, for some r > n — k, such that 1 

C = {x£ F": Hx T =0}. 

Any two such matrices are orthogonal to one another, in that we must have 
GH = 0. It is easy to verify that, if C has minimum distance d, then every 
choice of up to d — 1 columns of H are linearly independent, and there is a 
set of d columns of H that are dependent (and the dependency is given by a 
codeword of minimum weight). 

The dual of a linear code C of length n over F g (denoted by C T ) is defined 
as the dual vector space of the code; i.e., the set of vectors in F" that are all 
orthogonal to every codeword in C: 

C 1 - := {c e W q l : (VweC) c ■ w T = 0}. 

The dual of a A;-dimensional code has dimension n — k, and (C ) = C. 
Moreover, a generator matrix for the code C is a parity check matrix for C^ 
and vice versa. 

An encoder for a code C with q k codewords is a function E : S fc — > S n whose 
image is the code C. In particular, this means that E must be injective (one- 
to-one). Moreover, any generator matrix for a linear code defines the encoder 
E{x) := xG. The input x is referred to as the message. We will consider 
a code explicit if it is equipped with a polynomial-time computable encoder. 
For linear codes, this is equivalent to saying that there is a deterministic 
polynomial time algorithm (in the length n) that outputs a generator, or 
parity check, matrix for the code 2 . 

^^Here we consider vectors as row vectors, and denote column vectors (e.g., x T ) as trans- 
pose of row vectors. 

2 There are more strict possibilities for considering a code explicit; e.g., one may require 
each entry of a generator matrix to be computable in logarithmic space. 
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Given a message iGS', assume that an encoding of x is obtained using an 
encoder E; i.e., y := E{x) G X™. Consider a communication channel through 
which the encoded sequence y is communicated. The output of the channel 
y G S n is delivered to a receiver, whose goal is to reconstruct x from y. Ideally, 
if the channel is perfect, we will y = y and, since E(x) is injective, deducing 
x amounts to inverting the function E, which is an easy task for linear codes 
(in general, this can be done using Gaussian elimination). However, consider 
a closest distance decoder D: S n — > S fc that, given y, outputs an x G S fc 
for which d\st(E(x),y) is minimized. It is easy to see that, even if we allow 
the channel to arbitrarily alter up to t := [(d — l)/2j of the symbols in the 
transmitted sequence y (in symbols, if dist(y, y) < t), then we can still ensure 
that x is uniquely deducible from y; in particular, we must have D{y) = x. 

For a linear code over F q with parity check matrix H, a syndrome corre- 
sponding to a sequence y G F™ is the vector Hy T . Thus, y is a codeword if 
and only if its corresponding syndrome is the zero vector. Therefore, in the 
channel model above, if the syndrome corresponding to the received word y is 
nonzero, we can be certain that y ^ y. The converse is not necessarily true. 
However, it is a simple exercise to see that if y and y' G F™ are both such that 
y ^ y' and moreover dist(y, y) < t and d\st(y,y) < t, then the corresponding 
syndromes must be different; i.e., Hy 7^ Hy' . Therefore, provided that 
the number of errors is no more than the "unique-decoding threshold" t, it is 
"combinatorially" possible to uniquely reconstruct x from the syndrome cor- 
responding to the received word. This task is known as syndrome decoding. 
However, ideally it is desirable to have an efficient algorithm for syndrome 
decoding as well that runs in polynomial time in the length of the code. In 
general, syndrome decoding for a linear code defined by its parity check ma- 
trix is NP-hard (see |12|). However, a variety of explicit code constructions 
are equipped with efficient syndrome decoding algorithms. 

As discussed above, a code with minimum distance d can tolerate up to 
t := [(d — l)/2j errors. Moreover, if the number of errors can potentially be 
larger than t, then a confusion becomes unavoidable and unique decoding can 
no longer be guaranteed. However, the notion list decoding allows to control 
the "amount of confusion" when the number of errors is more than t. Namely, 
for a radius p and integer £ (referred to as the list size), a code C C [q] n is 
called (p, £) list-decodable if the number of codewords within a distance pn 
of any vector in [q] n is at most £. In this view, unique decoding corresponds 
to the case £ = 1, and a code with minimum distance d is (i^[(d — l)/2j , 1) 
list-decodable. However, for many theoretical and practical purposes, a small 
(but possibly much larger than 1) list size may be sufficient. 
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A. 2 Bounds on codes 

For positive integers n,d,q, denote by A q (n,d) the maximum size of a code 
with length n and minimum distance d over a g-ary alphabet, and define 



a q (5) := lim 



log„ A(n, Sn) 



n— >oo n 

as the "highest" rate a code with relative distance 5 can asymptotically attain. 
The exact form of the function ««(•) is not known for any q; however, certain 
lower and upper bounds for this quantity exist. In this section, we briefly 
review some important bounds on a q (5). 

The Gilbert- Varshamov bound 

Using the probabilistic method, it can be shown that a random linear code 
(constructed by picking the entries of its generator, or parity check, matrix 
uniformly and independently at random) with overwhelming probability at- 
tains a dimension-distance tradeoff given by 

k > n(l — h q (d/n)), 

where h q (-) is the g-ary entropy function defined as 

(A.l) h q (x) := xlog q (q- 1) - x\og q (x) - (1 - x)log g (l - x). 

Thus we get the lower bound 

a q (S) > 1 - h q {5) 

on the function a q (-), known as the Gilbert- Varshamov bound. 

The Singleton bound 

On the negative side, the Singleton bound states that the minimum distance 
d of any g-ary code with q k or more codewords must satisfy d < n — k + 1. 
Codes that attain this bound with equality are known as maximum distance 
separable (MDS) codes. Therefore we get that, regardless of the alphabet size, 
one must have 

a q {5) <1-S. 

Lower bounds for fixed alphabet size 

When the alphabet size q is fixed, there are numerous lower bounds known 
for the function a q (-). Here we list several such bounds. 

• Hamming (sphere packing) bound: a q (5) < 1 — h q (S/2). 
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Figure A.l: Bounds on binary codes: (1) Singleton bound, (2) Hamming 
bound, (3) Plotkin bound, (4) MRRW bound, (5) Gilbert- Varshamov bound. 



• Plotkin bound: a q (5) < max{0, 1 — S(q/(q — 1))}. 

• McEliece, Rodemich, Ramsey, and Welch (MRRW) bound: 

a2(5)<h 2 (^-y/S(l-6)). 
For the binary alphabet, these bounds are depicted in Figure |A~T} 



The Johnson Bound on List Decoding 

Intuitively, it is natural to expect that a code with large minimum distance 
must remain a good list-decodable code when the list-decoding radius exceeds 
half the minimum distance. The Johnson bound makes this intuition rigorous. 
Below we quote a strengthened version of the bound. 



Theorem A.l. (cf. )74| Section 3.3]) Let C be a q-ary code of length n, and 
relative distance 6 > (1 — l/q)(l — 5') for some 5' £ (0,1). Then for any 
j>VS',C is ((1 - l/g)(l - j),£) list-decodable for 

£ = mm{n(q - 1), 2 _ }■ 



Moreover, the code C is ((1 — l/q)(l — v<5'), 2n(q — 1) — 1) list-decodable. □ 

As an immediate corollary, we get that any binary code with relative dis- 
tance at least \ — e is (| — y/e, h) list-decodable. 



A.3. REED-SOLOMON CODES 
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A. 3 Reed- Solomon codes 

Let p = (pi , . . . , p n ) be a vector consisting of n distinct elements of F„ (assum- 
ing q > n). The evaluation vector of a polynomial /: F g — > W q with respect 
to p is the vector f{p) : = (/(pi), . . . , f(p n )) G F™. 

A Reed-Solomon code of length n and dimension fc over W q is the set of 
evaluation vectors of all polynomials of degree at most k — 1 over F g with 
respect to a particular choice of p. The dimension of this code is equal to k. A 
direct corollary of Euclidean division algorithm states that, over any field, the 
number of zeros of any nonzero polynomial is less than or equal to its degree. 
Thus, we get that the minimum distance of a Reed-Solomon code is at least 
n — k + 1, and because of the Singleton bound, is in fact equal to n — k + 1. 
Hence we see that a Reed-Solomon code is MDS. A generator matrix for a 
Reed-Solomon code is given by the Vandermonde matrix 



/ 1 



G:-- 






\Pl 



fc-i 



1 

pi 



pt 1 



i \ 



Pn 



Pn V 



A. 4 The Hadamard Code 

The Hadamard code of dimension n is a linear binary code of length 2 n whose 
generator matrix can be obtained by arranging all binary sequences of length 
n as its columns. Each codeword of the Hadamard code can thus be seen as 
the truth table of a linear form 

n 
£\X±, . . . , X n ) — > CtiXi 
i=l 

over the binary field. Therefore, each nonzero codeword must have weight 
exactly 2 n_1 , implying that the relative distance of the Hadamard code is \. 

A. 5 Concatenated Codes 



Concatenation is a classical operation on codes that is mainly used for reducing 
the alphabet size of a code. Suppose that C\ (called the outer code) is an 
(ni, k\, di)Q-code and C2 (called the inner code) is a (712, &2, cfe^-code, where 
Q = q k2 . The concatenation of C\ and C2, that we denote by C\ 0C2 is an 
(n, k, d) q -code that can be conveniently defined by its encoder mapping as 
follows. 

Let x = (xi, . . . ,Xfc x ) G [Q] kl be the message given to the encoder, and 



C(x) 



(ci,. 



6 C\ be its encoding under C\ . Each c, is thus an element 
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of [q ki ] and can thus be seen as a q-avy string of length fo- Denote by d i G [q] n2 
be the encoding of this string under 62- Then the encoding of x by the 
concatenated code C\ o C2 is the g-ary string of length n\n2 

(c 1; . . .,d, ') 



consisting of the string concatenation of symbol-wise encodings of C{x) using 

c 2 . 

Immediately from the above definition, one can see that n = nin2, and k = 
k\k,2- Moreover, it is straightforward to observe that the minimum distance 
of the concatenated code satisfies d > d\d2- When C\ and C2 are linear codes, 
the so is Ci 0C2. 

As an example, let C\ be a Reed-Solomon code of length n\ := 2 fc2 and 
dimension k\ := 25n\ over Fq, where Q := 2 2 . Thus the relative distance of 
C\ equals 1 — 25. As the inner code C2, take the Hadamard code of dimension 
&2 and length Q. The concatenated code C := C\ o C2 will thus have length 
n := Qn\ = 2 2 , dimension k := 5^2 2+1 , and relative distance at least 
= — 6. Therefore, we obtain a binary [n, k, d ] cod e where d > (^ — 5)n, and 



n< (k/5) 2 . By the Johnson bound (Theorem A. 1), this code must be (^ — S,£) 



list-decodable with list size at most 1/(25). 

We remark that binary codes with relative minimum distance 2 — 5 and rate 
fL(5 3 log(l/5)) (which only depends on the parameter 5) can be obtained by 
concatenating Geometric Goppa codes on the Tsfasman-Vladu^-Zink bound 
(see Section 4.3.2) with the Hadamard code. The Gilbert- Varshamov bound 



implies that binary codes with relative distance 5 — ^ an d r& te Q(5 2 ) exist, 
and on the other hand, we know by the MRRW bound that 0(5 2 \og(l/5)) is 
the best rate one can hope for. 



Profondement calme (Dans une brume doucement sonore) 

8 n - 8- 




Claude Debussy (1862-1918): Preludes, Book I, No. X 
(La cathedrale engloutie). 
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